From patchwork Wed Dec 4 07:46:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893318 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 898F718E750 for ; Wed, 4 Dec 2024 07:47:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298449; cv=none; b=meMT9izunecYXpcPn2VBETbWqvSpKsimhZDMGwMDGUslbGCTC5R/RoG4kiUx/MO2tjyLiRtS7Yc37RZ1dVAqqmbA8HK3faquTPRfVKJ1A/iGUNtOWKrOf79UIt6hgQGk3R1oI0mjjUIbsR0UnUdCyAkxXhMerLfFdZJcHR8sywQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298449; c=relaxed/simple; bh=xPmCsKpMpyS+IEf8Bve9mFZNuarXhCK8VeBON/LWLlE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sAvIMYbJMoPfl9aw8RniZvoy9qO7ZnSvcr5ccyNGv+EPG52+JcWSwIjbyTnxQsz/znReky5goLC5f3Xu2qK1inul6FCMuBl+HCmd+ySCRn+AfIUVooSOGTToUt5Gg+dj1ToqO4x1CUODmryHDyAjYopZgii64u3hPk0810l+lks= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RzmjPUNe; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RzmjPUNe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298446; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K9ysnB8wbhWhwEUuu4La2Q5hcuWjOlDcwGDAFBtqHkY=; b=RzmjPUNeag9W4L4PPVm8+tr+Ay57GpMzCBgonO+q6VOBbMN7I9AWmKpwoetx/O7R21OBTb cf79P++Ive8JBLgM6K+tie+6LwQv30a1zH3fGewzwdV8618LFfFgKpXrZpr8z2eBT8tSRb +SrE+tvqcpHKTl+CbC4QrHkXfh5C5p8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-623-hl0F6zMpMsmkZEde4NIExg-1; Wed, 04 Dec 2024 02:47:23 -0500 X-MC-Unique: hl0F6zMpMsmkZEde4NIExg-1 X-Mimecast-MFC-AGG-ID: hl0F6zMpMsmkZEde4NIExg Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B12EF1955F3C; Wed, 4 Dec 2024 07:47:21 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 542371956048; Wed, 4 Dec 2024 07:47:18 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org, Thomas Gleixner Subject: [PATCH net-next v2 01/39] ktime: Add us_to_ktime() Date: Wed, 4 Dec 2024 07:46:29 +0000 Message-ID: <20241204074710.990092-2-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Add a us_to_ktime() helper to go with ms_to_ktime() and ns_to_ktime(). Signed-off-by: David Howells cc: Thomas Gleixner cc: Jakub Kicinski cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/linux/ktime.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/ktime.h b/include/linux/ktime.h index 3a4e723eae0f..383ed9985802 100644 --- a/include/linux/ktime.h +++ b/include/linux/ktime.h @@ -222,6 +222,11 @@ static inline ktime_t ns_to_ktime(u64 ns) return ns; } +static inline ktime_t us_to_ktime(u64 us) +{ + return us * NSEC_PER_USEC; +} + static inline ktime_t ms_to_ktime(u64 ms) { return ms * NSEC_PER_MSEC; From patchwork Wed Dec 4 07:46:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893319 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 755C418FC8C for ; Wed, 4 Dec 2024 07:47:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298453; cv=none; b=m7MpvM+Z7UjQ6NkA/o2rt2LGtl/H47qV6DEZPVJXK1yYTGS1tA/V//E/bwpKHS6HP8eYvP7DLK4jxdPUiRdVZt/FyAQsUWJNL35bmahHj77kUB9kEbBLOO4GGOjMDrctFechxGACr4D60OpwM4bIXDltV62v0BmVhyQ+LwJ2HpE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298453; c=relaxed/simple; bh=LiembOcurmOw4+40ybsntquCMb6mhOp2FYXN7pIggRs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sTaLvyscT0nDMqKArMUMlok95QedrN1/7GHY/w0HOJyKWZjTo2UlCT7AFzRih78GkMYKkbtyk0YyxkU2vSdliY3U5GMzZBSG3iuMLsxg18D93+todY64vIdjbD0A8e09Io2h0vemqsF5YJb3F4q1nCPyCmYTX8kISpZJUgjdTz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EHFsZQDO; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EHFsZQDO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298450; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SgDcxHvMPuaNGZPmBKKvBNkHii79DsY2D29vW13rUKI=; b=EHFsZQDOV4JQ7ffXhk9FZuq/cj0emjkqhYZ9CAsm7BpkCXRvls10nO8I510r92efUwGwV/ gis89vVFnL27ohBpDoS95/1dia+NQ4IXyjkgtl6kVRrtvoqCk1sr9OSC2BcHPQiZrfWYOZ zStm0njG0xco+lOOvE8mo0NF/mAON90= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-lMc423foOGCfWEJEdVllBw-1; Wed, 04 Dec 2024 02:47:27 -0500 X-MC-Unique: lMc423foOGCfWEJEdVllBw-1 X-Mimecast-MFC-AGG-ID: lMc423foOGCfWEJEdVllBw Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BB24C1955DD3; Wed, 4 Dec 2024 07:47:25 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0CCA11956048; Wed, 4 Dec 2024 07:47:22 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 02/39] rxrpc: Fix handling of received connection abort Date: Wed, 4 Dec 2024 07:46:30 +0000 Message-ID: <20241204074710.990092-3-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Fix the handling of a connection abort that we've received. Though the abort is at the connection level, it needs propagating to the calls on that connection. Whilst the propagation bit is performed, the calls aren't then woken up to go and process their termination, and as no further input is forthcoming, they just hang. Also add some tracing for the logging of connection aborts. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 25 +++++++++++++++++++++++++ net/rxrpc/conn_event.c | 12 ++++++++---- 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index d03e0bd8c028..27c23873c881 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -117,6 +117,7 @@ #define rxrpc_call_poke_traces \ EM(rxrpc_call_poke_abort, "Abort") \ EM(rxrpc_call_poke_complete, "Compl") \ + EM(rxrpc_call_poke_conn_abort, "Conn-abort") \ EM(rxrpc_call_poke_error, "Error") \ EM(rxrpc_call_poke_idle, "Idle") \ EM(rxrpc_call_poke_set_timeout, "Set-timo") \ @@ -282,6 +283,7 @@ EM(rxrpc_call_see_activate_client, "SEE act-clnt") \ EM(rxrpc_call_see_connect_failed, "SEE con-fail") \ EM(rxrpc_call_see_connected, "SEE connect ") \ + EM(rxrpc_call_see_conn_abort, "SEE conn-abt") \ EM(rxrpc_call_see_disconnected, "SEE disconn ") \ EM(rxrpc_call_see_distribute_error, "SEE dist-err") \ EM(rxrpc_call_see_input, "SEE input ") \ @@ -981,6 +983,29 @@ TRACE_EVENT(rxrpc_rx_abort, __entry->abort_code) ); +TRACE_EVENT(rxrpc_rx_conn_abort, + TP_PROTO(const struct rxrpc_connection *conn, const struct sk_buff *skb), + + TP_ARGS(conn, skb), + + TP_STRUCT__entry( + __field(unsigned int, conn) + __field(rxrpc_serial_t, serial) + __field(u32, abort_code) + ), + + TP_fast_assign( + __entry->conn = conn->debug_id; + __entry->serial = rxrpc_skb(skb)->hdr.serial; + __entry->abort_code = skb->priority; + ), + + TP_printk("C=%08x ABORT %08x ac=%d", + __entry->conn, + __entry->serial, + __entry->abort_code) + ); + TRACE_EVENT(rxrpc_rx_challenge, TP_PROTO(struct rxrpc_connection *conn, rxrpc_serial_t serial, u32 version, u32 nonce, u32 min_level), diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 598b4ee389fc..2a1396cd892f 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -63,11 +63,12 @@ int rxrpc_abort_conn(struct rxrpc_connection *conn, struct sk_buff *skb, /* * Mark a connection as being remotely aborted. */ -static bool rxrpc_input_conn_abort(struct rxrpc_connection *conn, +static void rxrpc_input_conn_abort(struct rxrpc_connection *conn, struct sk_buff *skb) { - return rxrpc_set_conn_aborted(conn, skb, skb->priority, -ECONNABORTED, - RXRPC_CALL_REMOTELY_ABORTED); + trace_rxrpc_rx_conn_abort(conn, skb); + rxrpc_set_conn_aborted(conn, skb, skb->priority, -ECONNABORTED, + RXRPC_CALL_REMOTELY_ABORTED); } /* @@ -202,11 +203,14 @@ static void rxrpc_abort_calls(struct rxrpc_connection *conn) for (i = 0; i < RXRPC_MAXCALLS; i++) { call = conn->channels[i].call; - if (call) + if (call) { + rxrpc_see_call(call, rxrpc_call_see_conn_abort); rxrpc_set_call_completion(call, conn->completion, conn->abort_code, conn->error); + rxrpc_poke_call(call, rxrpc_call_poke_conn_abort); + } } _leave(""); From patchwork Wed Dec 4 07:46:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893320 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29BCA190059 for ; Wed, 4 Dec 2024 07:47:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298456; cv=none; b=JBNkdl9mtVEjAI20S10A6ZtlIwH+pMwoXu3iizMnTDq4xiYyG4IP6ZBnerXL2xT1Kiuo1doXgkzLtsGipYRTH/yH7fvNFHktgqDe6f6wIJTUgm1f3nn7GFd9sSRgNJ/5MN7Q3fP4ENyPl+Jul6nFbnTesc2o24HJXl2wwI7Wfrg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298456; c=relaxed/simple; bh=Ajr6EJYUikw3/5g4fJy9d6ePJmsXOdDxMyw8WEJtZ1Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mSca1Xopv61dN34HbhJGWDktdMJaLfxnwWHUn1+KlKUq7vNN4m9xP8ST5S6QeiGD9iTO78ZFSxoLqYjvNPdxTnZBsO9qjCaYhC3MxtxgFr4RjtqM+2uZmeD4StONDKgTv2kleK5LPHuPgS3Kpd0y0in0WN/Y9JX27i4fkV7lKFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=civFoOvu; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="civFoOvu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7W+ovC1/MZN/VGq8X1RAdXcqzCIH+5G/UI+MbcPx/zU=; b=civFoOvu1iinnWBn0htXBhkfe29RD7gyCiR+R1YD1DnVQvHSzCG4fXdpbCVeY/9V/oqx/Z w3mCUmlRqXLKyhV9hMOm6xzgMh8lSyznN/kPLYxf8p1k8amaEmtws+AzH8dVLyvyV85cuH YMJLSVRoLAujvUOQp5pV+xBQTt4IfYY= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-201-oOG-94YcNmqcvA7pBMIHLQ-1; Wed, 04 Dec 2024 02:47:31 -0500 X-MC-Unique: oOG-94YcNmqcvA7pBMIHLQ-1 X-Mimecast-MFC-AGG-ID: oOG-94YcNmqcvA7pBMIHLQ Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3724D195420F; Wed, 4 Dec 2024 07:47:30 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 362DE1956094; Wed, 4 Dec 2024 07:47:27 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 03/39] rxrpc: Use umin() and umax() rather than min_t()/max_t() where possible Date: Wed, 4 Dec 2024 07:46:31 +0000 Message-ID: <20241204074710.990092-4-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Use umin() and umax() rather than min_t()/max_t() where the type specified is an unsigned type. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/call_event.c | 5 ++--- net/rxrpc/call_object.c | 4 ++-- net/rxrpc/conn_client.c | 2 +- net/rxrpc/input.c | 13 +++++-------- net/rxrpc/insecure.c | 2 +- net/rxrpc/io_thread.c | 2 +- net/rxrpc/output.c | 2 +- net/rxrpc/rtt.c | 6 +++--- net/rxrpc/rxkad.c | 6 +++--- net/rxrpc/rxperf.c | 2 +- net/rxrpc/sendmsg.c | 2 +- 11 files changed, 21 insertions(+), 25 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 7bbb68504766..c4754cc9b8d4 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -233,8 +233,7 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call) static bool rxrpc_tx_window_has_space(struct rxrpc_call *call) { - unsigned int winsize = min_t(unsigned int, call->tx_winsize, - call->cong_cwnd + call->cong_extra); + unsigned int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); rxrpc_seq_t window = call->acks_hard_ack, wtop = window + winsize; rxrpc_seq_t tx_top = call->tx_top; int space; @@ -467,7 +466,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) } else { unsigned long nowj = jiffies, delayj, nextj; - delayj = max(nsecs_to_jiffies(delay), 1); + delayj = umax(nsecs_to_jiffies(delay), 1); nextj = nowj + delayj; if (time_before(nextj, call->timer.expires) || !timer_pending(&call->timer)) { diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index f9e983a12c14..0df647d1d3a2 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -220,9 +220,9 @@ static struct rxrpc_call *rxrpc_alloc_client_call(struct rxrpc_sock *rx, __set_bit(RXRPC_CALL_EXCLUSIVE, &call->flags); if (p->timeouts.normal) - call->next_rx_timo = min(p->timeouts.normal, 1); + call->next_rx_timo = umin(p->timeouts.normal, 1); if (p->timeouts.idle) - call->next_req_timo = min(p->timeouts.idle, 1); + call->next_req_timo = umin(p->timeouts.idle, 1); if (p->timeouts.hard) call->hard_timo = p->timeouts.hard; diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index bb11e8289d6d..86fb18bcd188 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -231,7 +231,7 @@ static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn) distance = id - id_cursor; if (distance < 0) distance = -distance; - limit = max_t(unsigned long, atomic_read(&rxnet->nr_conns) * 4, 1024); + limit = umax(atomic_read(&rxnet->nr_conns) * 4, 1024); if (distance > limit) goto mark_dont_reuse; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 16d49a861dbb..49e35be7dc13 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -44,8 +44,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, if (test_and_clear_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags)) { summary->retrans_timeo = true; - call->cong_ssthresh = max_t(unsigned int, - summary->flight_size / 2, 2); + call->cong_ssthresh = umax(summary->flight_size / 2, 2); cwnd = 1; if (cwnd >= call->cong_ssthresh && call->cong_mode == RXRPC_CALL_SLOW_START) { @@ -113,8 +112,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, change = rxrpc_cong_begin_retransmission; call->cong_mode = RXRPC_CALL_FAST_RETRANSMIT; - call->cong_ssthresh = max_t(unsigned int, - summary->flight_size / 2, 2); + call->cong_ssthresh = umax(summary->flight_size / 2, 2); cwnd = call->cong_ssthresh + 3; call->cong_extra = 0; call->cong_dup_acks = 0; @@ -206,9 +204,8 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) rxrpc_inc_stat(call->rxnet, stat_tx_data_cwnd_reset); call->tx_last_sent = now; call->cong_mode = RXRPC_CALL_SLOW_START; - call->cong_ssthresh = max_t(unsigned int, call->cong_ssthresh, - call->cong_cwnd * 3 / 4); - call->cong_cwnd = max_t(unsigned int, call->cong_cwnd / 2, RXRPC_MIN_CWND); + call->cong_ssthresh = umax(call->cong_ssthresh, call->cong_cwnd * 3 / 4); + call->cong_cwnd = umax(call->cong_cwnd / 2, RXRPC_MIN_CWND); } /* @@ -709,7 +706,7 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb call->tx_winsize = rwind; } - mtu = min(ntohl(trailer->maxMTU), ntohl(trailer->ifMTU)); + mtu = umin(ntohl(trailer->maxMTU), ntohl(trailer->ifMTU)); peer = call->peer; if (mtu < peer->maxdata) { diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c index 6716c021a532..751eb621021d 100644 --- a/net/rxrpc/insecure.c +++ b/net/rxrpc/insecure.c @@ -19,7 +19,7 @@ static int none_init_connection_security(struct rxrpc_connection *conn, */ static struct rxrpc_txbuf *none_alloc_txbuf(struct rxrpc_call *call, size_t remain, gfp_t gfp) { - return rxrpc_alloc_data_txbuf(call, min_t(size_t, remain, RXRPC_JUMBO_DATALEN), 1, gfp); + return rxrpc_alloc_data_txbuf(call, umin(remain, RXRPC_JUMBO_DATALEN), 1, gfp); } static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index 07c74c77d802..7af5adf53b25 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -558,7 +558,7 @@ int rxrpc_io_thread(void *data) } timeout = nsecs_to_jiffies(delay_ns); - timeout = max(timeout, 1UL); + timeout = umax(timeout, 1); schedule_timeout(timeout); __set_current_state(TASK_RUNNING); continue; diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 5ea9601efd05..85112ea31a39 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -118,7 +118,7 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, txb->kvec[1].iov_len = ack->nAcks; wrap = RXRPC_SACK_SIZE - sack; - to = min_t(unsigned int, ack->nAcks, RXRPC_SACK_SIZE); + to = umin(ack->nAcks, RXRPC_SACK_SIZE); if (sack + ack->nAcks <= RXRPC_SACK_SIZE) { memcpy(sackp, call->ackr_sack_table + sack, ack->nAcks); diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index cdab7b7d08a0..6dc51486b5a6 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -27,7 +27,7 @@ static u32 __rxrpc_set_rto(const struct rxrpc_peer *peer) static u32 rxrpc_bound_rto(u32 rto) { - return min(rto, RXRPC_RTO_MAX); + return umin(rto, RXRPC_RTO_MAX); } /* @@ -91,11 +91,11 @@ static void rxrpc_rtt_estimator(struct rxrpc_peer *peer, long sample_rtt_us) /* no previous measure. */ srtt = m << 3; /* take the measured time to be rtt */ peer->mdev_us = m << 1; /* make sure rto = 3*rtt */ - peer->rttvar_us = max(peer->mdev_us, rxrpc_rto_min_us(peer)); + peer->rttvar_us = umax(peer->mdev_us, rxrpc_rto_min_us(peer)); peer->mdev_max_us = peer->rttvar_us; } - peer->srtt_us = max(1U, srtt); + peer->srtt_us = umax(srtt, 1); } /* diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index 48a1475e6b06..e3194d73dd84 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -150,11 +150,11 @@ static struct rxrpc_txbuf *rxkad_alloc_txbuf(struct rxrpc_call *call, size_t rem struct rxrpc_txbuf *txb; size_t shdr, space; - remain = min(remain, 65535 - sizeof(struct rxrpc_wire_header)); + remain = umin(remain, 65535 - sizeof(struct rxrpc_wire_header)); switch (call->conn->security_level) { default: - space = min_t(size_t, remain, RXRPC_JUMBO_DATALEN); + space = umin(remain, RXRPC_JUMBO_DATALEN); return rxrpc_alloc_data_txbuf(call, space, 1, gfp); case RXRPC_SECURITY_AUTH: shdr = sizeof(struct rxkad_level1_hdr); @@ -164,7 +164,7 @@ static struct rxrpc_txbuf *rxkad_alloc_txbuf(struct rxrpc_call *call, size_t rem break; } - space = min_t(size_t, round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN), remain + shdr); + space = umin(round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN), remain + shdr); space = round_up(space, RXKAD_ALIGN); txb = rxrpc_alloc_data_txbuf(call, space, RXKAD_ALIGN, gfp); diff --git a/net/rxrpc/rxperf.c b/net/rxrpc/rxperf.c index 085e7892d310..7ef93407be83 100644 --- a/net/rxrpc/rxperf.c +++ b/net/rxrpc/rxperf.c @@ -503,7 +503,7 @@ static int rxperf_process_call(struct rxperf_call *call) reply_len + sizeof(rxperf_magic_cookie)); while (reply_len > 0) { - len = min_t(size_t, reply_len, PAGE_SIZE); + len = umin(reply_len, PAGE_SIZE); bvec_set_page(&bv, ZERO_PAGE(0), len, 0); iov_iter_bvec(&msg.msg_iter, WRITE, &bv, 1, len); msg.msg_flags = MSG_MORE; diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 6abb8eec1b2b..b04afb5df241 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -360,7 +360,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, /* append next segment of data to the current buffer */ if (msg_data_left(msg) > 0) { - size_t copy = min_t(size_t, txb->space, msg_data_left(msg)); + size_t copy = umin(txb->space, msg_data_left(msg)); _debug("add %zu", copy); if (!copy_from_iter_full(txb->kvec[0].iov_base + txb->offset, From patchwork Wed Dec 4 07:46:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893321 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07F12191478 for ; Wed, 4 Dec 2024 07:47:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298462; cv=none; b=KBrLX88HG1zDFxPjtLUrz91xYoWmQZwlhiRCNoQ8YCMzr4akGxj/n8IOK/63fzLxdiZBVhKW1HRkhxaJ+oEI+uixkH8yEVn1ho3oh61SltCYGeVZ5JTQ63XUZy3wVPF73YlC2frGeRR1pdouFYJnlkO39aW7gES181xrOnMF4xg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298462; c=relaxed/simple; bh=bg28pajEEjGjeO4pOGZmHHUzSUhOzQVpV2W5Uqp0eCg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p1zcpyiy4qp0refg926hb2wdTkTILVhtExydc09ZNb/Gi0VH8SLthhTrhQeYBf1OS3MPRyRq4ippytM0ieKDPvnEvsYUBWdCpYiX/nveONRCR3I3AxugNR7+hAerhEchEt2fyY+vOAN4aE1FErc+cIy2NMVod3NqXnRr92W3MPU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZAH8/J0v; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZAH8/J0v" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nQxHiYMs7bpD+fxOYblKM9SE945ZS3Tm4oRrQv3yLdM=; b=ZAH8/J0vXOQDnXOE6htrDJn6lV8jxFtb0bfJUohc299HZpBDzAJyGYviBTd2f8qGSHtBJF RtAlo/Qnf60o/kzSrmomeMye8bVvOysHaMgnyeAsqOSbYBQfGstRDU7c5g/cOjQHTNOTKH OYGcV8B3kief0oDp+KmYzbZf6nL87z8= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-571-Lg8E0107PN6FmDhC1cCpbg-1; Wed, 04 Dec 2024 02:47:36 -0500 X-MC-Unique: Lg8E0107PN6FmDhC1cCpbg-1 X-Mimecast-MFC-AGG-ID: Lg8E0107PN6FmDhC1cCpbg Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6CAF11955E8E; Wed, 4 Dec 2024 07:47:34 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B9F481956048; Wed, 4 Dec 2024 07:47:31 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 04/39] rxrpc: Clean up Tx header flags generation handling Date: Wed, 4 Dec 2024 07:46:32 +0000 Message-ID: <20241204074710.990092-5-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Clean up the generation of the header flags when building packet headers for transmission: (1) Assemble the flags in a local variable rather than in the txb->flags. (2) Do the flags masking and JUMBO-PACKET setting in one bit of code for both the main header and the jumbo headers. (3) Generate the REQUEST-ACK flag afresh each time. There's a possibility we might want to do jumbo retransmission packets in future. (4) Pass the local flags variable to the rxrpc_tx_data tracepoint rather than the combination of the txb flags and the wire header flags (the latter belong only to the first subpacket). Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 1 - net/rxrpc/ar-internal.h | 2 +- net/rxrpc/output.c | 18 ++++++++++++------ net/rxrpc/proc.c | 3 +-- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 27c23873c881..62064f63d6eb 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -452,7 +452,6 @@ #define rxrpc_req_ack_traces \ EM(rxrpc_reqack_ack_lost, "ACK-LOST ") \ - EM(rxrpc_reqack_already_on, "ALREADY-ON") \ EM(rxrpc_reqack_more_rtt, "MORE-RTT ") \ EM(rxrpc_reqack_no_srv_last, "NO-SRVLAST") \ EM(rxrpc_reqack_old_rtt, "OLD-RTT ") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d0fd37bdcfe9..fcdfbc1d5aaf 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -110,7 +110,7 @@ struct rxrpc_net { atomic_t stat_tx_acks[256]; atomic_t stat_rx_acks[256]; - atomic_t stat_why_req_ack[8]; + atomic_t stat_why_req_ack[7]; atomic_t stat_io_loop; }; diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 85112ea31a39..50d5f2a02458 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -330,6 +330,8 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; enum rxrpc_req_ack_trace why; struct rxrpc_connection *conn = call->conn; + bool last; + u8 flags; _enter("%x,{%d}", txb->seq, txb->len); @@ -339,6 +341,10 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t txb->seq == 1) whdr->userStatus = RXRPC_USERSTATUS_SERVICE_UPGRADE; + txb->flags &= ~RXRPC_REQUEST_ACK; + flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; + last = txb->flags & RXRPC_LAST_PACKET; + /* If our RTT cache needs working on, request an ACK. Also request * ACKs if a DATA packet appears to have been lost. * @@ -346,9 +352,7 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t * service call, lest OpenAFS incorrectly send us an ACK with some * soft-ACKs in it and then never follow up with a proper hard ACK. */ - if (txb->flags & RXRPC_REQUEST_ACK) - why = rxrpc_reqack_already_on; - else if ((txb->flags & RXRPC_LAST_PACKET) && rxrpc_sending_to_client(txb)) + if (last && rxrpc_sending_to_client(txb)) why = rxrpc_reqack_no_srv_last; else if (test_and_clear_bit(RXRPC_CALL_EV_ACK_LOST, &call->events)) why = rxrpc_reqack_ack_lost; @@ -367,15 +371,17 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t rxrpc_inc_stat(call->rxnet, stat_why_req_ack[why]); trace_rxrpc_req_ack(call->debug_id, txb->seq, why); - if (why != rxrpc_reqack_no_srv_last) + if (why != rxrpc_reqack_no_srv_last) { txb->flags |= RXRPC_REQUEST_ACK; + flags |= RXRPC_REQUEST_ACK; + } dont_set_request_ack: - whdr->flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; + whdr->flags = flags; whdr->serial = htonl(txb->serial); whdr->cksum = txb->cksum; - trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags, false); + trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, false); } /* diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 263a2251e3d2..3b7e34dd4385 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -519,9 +519,8 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DELAY]), atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE])); seq_printf(seq, - "Why-Req-A: acklost=%u already=%u mrtt=%u ortt=%u\n", + "Why-Req-A: acklost=%u mrtt=%u ortt=%u\n", atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_ack_lost]), - atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_already_on]), atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_more_rtt]), atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt])); seq_printf(seq, From patchwork Wed Dec 4 07:46:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893322 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40314191484 for ; Wed, 4 Dec 2024 07:47:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298467; cv=none; b=hEYRGeImdNd48pcbGAk8ctM82CnST/ENHv6eXNqFlqCOeLTZV5iQ4s5pTZlbx3tofFVA+YzveaKmeGrn6XU/ie5wLDJk0KuJUGMG9sKL/wjEhem5EK+1HCUsN5k0jxsSh6ZUoZmqiaLkdCdDvZLUBZqATRGxGu5IERMrn0kfTgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298467; c=relaxed/simple; bh=nPLLAv80bSsdFzVdKVGyYb87epdi2nVWr770CVc/eEM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XVBF33ig1lKFzerHkyq5TdxX9KAod9VzdWfxC38PGPP3usX7fx7I2ZzhPXyTx9KiCh1ncwfU3IZAmQewyHAHZZ5lyotxXKzPZd9BE1ZwbLNl0W5OkSaJDQpXWIfY5xKrX1CIXNs3BNyoQbVCGgqDBTFlks1QrFQlAC4l9c9kqi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KmIOtPIm; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KmIOtPIm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298465; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wjpai5tSDIjcrPtWrI7pL4/bw52/8YrvhGJIA2364Ug=; b=KmIOtPImYEiBSLfjRFhSo3Db/+vInbBBrip9gtEHvfECKaLfOJ8Fa1jZivEDW8VD7MyCDM 0cbtG9YLwatrT46bc/ciTGw2kvIrUdsX9Ric+/U9FphPw6HI2oSEWs9uIquY+jfaBI4qeC cTg+6frqNVmsu/nAnTqqVyYUCZ1i/5w= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-167-LaVSvlZUMUO0pf_z8UEqvw-1; Wed, 04 Dec 2024 02:47:40 -0500 X-MC-Unique: LaVSvlZUMUO0pf_z8UEqvw-1 X-Mimecast-MFC-AGG-ID: LaVSvlZUMUO0pf_z8UEqvw Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B41691955F38; Wed, 4 Dec 2024 07:47:38 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DB3CE1956089; Wed, 4 Dec 2024 07:47:35 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 05/39] rxrpc: Don't set the MORE-PACKETS rxrpc wire header flag Date: Wed, 4 Dec 2024 07:46:33 +0000 Message-ID: <20241204074710.990092-6-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org The MORE-PACKETS rxrpc header flag hasn't actually been looked at by anything since 1988 and not all implementations generate it. Change rxrpc so that it doesn't set MORE-PACKETS at all rather than setting it inconsistently. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/sendmsg.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index b04afb5df241..546abb463c3f 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -385,9 +385,6 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, (msg_data_left(msg) == 0 && !more)) { if (msg_data_left(msg) == 0 && !more) txb->flags |= RXRPC_LAST_PACKET; - else if (call->tx_top - call->acks_hard_ack < - call->tx_winsize) - txb->flags |= RXRPC_MORE_PACKETS; ret = call->security->secure_packet(call, txb); if (ret < 0) From patchwork Wed Dec 4 07:46:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893323 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63A6718FC84 for ; Wed, 4 Dec 2024 07:47:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298470; cv=none; b=PT0SpLIlg9cAPo5jPS7dRjqZ2dx/XZ0FXtMBEYX/0Jbt5edNATIFoPt9xy8Q7wPtcDrHm1PX2IsEroebWhpT9zBRGVzZPCNzvXU2gUSnfHF1TvGu7e0V4o8MLyafZ5zvrRVehIE3ZxxsAMEkQvAs7xsTyeJFqPY41a/Km2A42xI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298470; c=relaxed/simple; bh=AZ7lDyNSKB/HD+ctLesXC+cPvo0gwbx5oHUqbI7hpVc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JAUKgktBI7hxEW5jiFEcewuFjZ2oXUrgMsR9QN/OxNcMSXSV6g89dz+3yJQ8BirwIz5ooSa/SPvjMzDL8eTJivz5P1zx+GcamhF7fVDTg738cyidL0JOJOFZhlzpBguhyh387Xtr7/J6+2gMkIZg6uO4T9SVJFQkhnb8UdGhMBA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YUZnwfNH; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YUZnwfNH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298468; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8kXP4fALYJr9uIGcA5czI2cKiY8B4VASJULQ3cokVnw=; b=YUZnwfNHavYwDjZPutvzc46HM7kO1lugAM/MlzJRmiGEsNmBVUeTRVmbmKD5A6ldb7nUeN 5MhANzFfDyG8F3cQbcn6p/k0xhPPBc8Cjtv0Le5n1cObyS2rOslvCwlHovneDXpvpKHhhn qOPoyX7U2RApfJ70MKzFOyxRSICdhJM= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-518-kRaEhlarPZK--7Ld8TNxFQ-1; Wed, 04 Dec 2024 02:47:44 -0500 X-MC-Unique: kRaEhlarPZK--7Ld8TNxFQ-1 X-Mimecast-MFC-AGG-ID: kRaEhlarPZK--7Ld8TNxFQ Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 94FC61955DCA; Wed, 4 Dec 2024 07:47:42 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0B0C81956054; Wed, 4 Dec 2024 07:47:39 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 06/39] rxrpc: Show stats counter for received reason-0 ACKs Date: Wed, 4 Dec 2024 07:46:34 +0000 Message-ID: <20241204074710.990092-7-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org In /proc/net/rxrpc/stats, show the stats counter for received ACKs that have the reason code set to 0 as some implementations do this. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 3b7e34dd4385..cdf32f0d8e0e 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -508,7 +508,7 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_DELAY]), atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_IDLE])); seq_printf(seq, - "Ack-Rx : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u\n", + "Ack-Rx : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u z=%u\n", atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_REQUESTED]), atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DUPLICATE]), atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_OUT_OF_SEQUENCE]), @@ -517,7 +517,8 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING]), atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING_RESPONSE]), atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DELAY]), - atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE])); + atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE]), + atomic_read(&rxnet->stat_rx_acks[0])); seq_printf(seq, "Why-Req-A: acklost=%u mrtt=%u ortt=%u\n", atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_ack_lost]), From patchwork Wed Dec 4 07:46:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893324 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B13318FDB1 for ; Wed, 4 Dec 2024 07:47:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298477; cv=none; b=GWv2QfpG5I/foktomUHFrhROGGHOh0EukVSWqrA2ybkeiPFk2ZFXTfWDPFnhvKEt5HVh771fal+qTff6TgtuTSL4C1DxK+Cx4fytjJVM4POv+TOpVQr2Be+w7rEloRhogrGt+P7yEwP8FUtWfE1/N3keaNpL6MQiiPktCGdlds0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298477; c=relaxed/simple; bh=CYaWNczilBQQxRybIYsmO8leO0udzco+8cHBIiXxTAU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jp6gQKM45OoqtO6mXDUHdnUxLRMoe/kqoce2KsQmExFYfJOvGMWGjSPE8nNMiU35wTsL2JPhOrQQ5A3aPGzmbe8kgamj9ix/O/f/B16nuRX1dtxaVNaI+lSi5LvifLMKMtq1R58rGkyYu4+p40DynHksOfGIOJgFNVggsmjqru8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=jUD6joiZ; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="jUD6joiZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=12Usz2LdbarC0pVr9mHGx07BN0BAEeux0RoFdQjq1WE=; b=jUD6joiZ5X8NwQ5AVb9+M0gOtJps9Qk1Z/sYZzNH7E15y3PK1aUnX6siWhlz/XyEXFlsWp jP2wbEVgoqK1dWo8yb5KWoRHc9ZigUcBgchcE4lyGTLt62OzkVEuKjwYb6Z2MxASVPc2UK r2NhE7u+rsIxq6PwEUiL3xmhEhTj8aw= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460-qhMF7EcrMdOlmMfxyfJEmA-1; Wed, 04 Dec 2024 02:47:49 -0500 X-MC-Unique: qhMF7EcrMdOlmMfxyfJEmA-1 X-Mimecast-MFC-AGG-ID: qhMF7EcrMdOlmMfxyfJEmA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3EEE31955DAD; Wed, 4 Dec 2024 07:47:48 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0F36A3000197; Wed, 4 Dec 2024 07:47:43 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 07/39] rxrpc: Request an ACK on impending Tx stall Date: Wed, 4 Dec 2024 07:46:35 +0000 Message-ID: <20241204074710.990092-8-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Set the REQUEST-ACK flag on the DATA packet we're about to send if we're about to stall transmission because the app layer isn't keeping up supplying us with data to transmit. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 1 + net/rxrpc/ar-internal.h | 2 +- net/rxrpc/output.c | 7 ++++++- net/rxrpc/proc.c | 5 +++-- 4 files changed, 11 insertions(+), 4 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 62064f63d6eb..d86b5f07d292 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -452,6 +452,7 @@ #define rxrpc_req_ack_traces \ EM(rxrpc_reqack_ack_lost, "ACK-LOST ") \ + EM(rxrpc_reqack_app_stall, "APP-STALL ") \ EM(rxrpc_reqack_more_rtt, "MORE-RTT ") \ EM(rxrpc_reqack_no_srv_last, "NO-SRVLAST") \ EM(rxrpc_reqack_old_rtt, "OLD-RTT ") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index fcdfbc1d5aaf..d0fd37bdcfe9 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -110,7 +110,7 @@ struct rxrpc_net { atomic_t stat_tx_acks[256]; atomic_t stat_rx_acks[256]; - atomic_t stat_why_req_ack[7]; + atomic_t stat_why_req_ack[8]; atomic_t stat_io_loop; }; diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 50d5f2a02458..b93a5d50be3e 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -330,7 +330,7 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; enum rxrpc_req_ack_trace why; struct rxrpc_connection *conn = call->conn; - bool last; + bool last, more; u8 flags; _enter("%x,{%d}", txb->seq, txb->len); @@ -345,6 +345,9 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; last = txb->flags & RXRPC_LAST_PACKET; + more = (!list_is_last(&txb->call_link, &call->tx_buffer) || + !list_empty(&call->tx_sendmsg)); + /* If our RTT cache needs working on, request an ACK. Also request * ACKs if a DATA packet appears to have been lost. * @@ -366,6 +369,8 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t why = rxrpc_reqack_more_rtt; else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), ktime_get_real())) why = rxrpc_reqack_old_rtt; + else if (!last && !more) + why = rxrpc_reqack_app_stall; else goto dont_set_request_ack; diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index cdf32f0d8e0e..ce4d48bdfbe9 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -520,10 +520,11 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE]), atomic_read(&rxnet->stat_rx_acks[0])); seq_printf(seq, - "Why-Req-A: acklost=%u mrtt=%u ortt=%u\n", + "Why-Req-A: acklost=%u mrtt=%u ortt=%u stall=%u\n", atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_ack_lost]), atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_more_rtt]), - atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt])); + atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt]), + atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_app_stall])); seq_printf(seq, "Why-Req-A: nolast=%u retx=%u slows=%u smtxw=%u\n", atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_no_srv_last]), From patchwork Wed Dec 4 07:46:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893325 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 822DB192D9D for ; Wed, 4 Dec 2024 07:47:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298479; cv=none; b=CZDqgl8ISZEaASHj3TB1R8Py9chXn3fZjecLCxSJsNvP9HJ4aGHZmjzhH+uI5bSGWCzkKZxLfYcjGLDgsHxe86HtZOay/2U8YjwSFqYpwGVBcF9og2QvV+iT/cxzm7W5MhA86KZZu00CLp93xxU9yUkYCRhYi29nwGswQReeUSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298479; c=relaxed/simple; bh=yZpcH3jfOZhaIMhDc90aLeHm+zbrPem84Ky8S2xB72U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P+AzkSILlml5WBUk6LPR4FBawrkJo1DBoCq2iWQhpzKVMyYpFk2IauxskoBoypIRvx0lmBwXgqvR3gkhM3zJYNaczlV+MxQMq4K/k1YwJb3XA3YMr0XS/z1D9LN8EmslNlaEl9SNDuwl5KfpMHNo065nexN4bJn1xWCDALX1C/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZvWpmJYJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZvWpmJYJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298476; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7YbPQ3cKw3bKxL4TM3+ZS8qJ4Li9YNqJTqyKnz0s0FM=; b=ZvWpmJYJ6qElYSH8T7Wvp/J9M52e/GoVhalCJ7YJH9w2yECDG6ujfw/WOvOqUBFtIDUBUC XsyR6P207hbygoYTmAPQ3EpjsgWcvGjv8D5x7A+L/N9jpiYMpUwaqbxkbicPfE1STLOOIE RYCeI7g4K4Tay/3K5mq69NZ14NxUQDY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-Him0VSniOuG3SqQfmI2MLw-1; Wed, 04 Dec 2024 02:47:53 -0500 X-MC-Unique: Him0VSniOuG3SqQfmI2MLw-1 X-Mimecast-MFC-AGG-ID: Him0VSniOuG3SqQfmI2MLw Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 526CA1956058; Wed, 4 Dec 2024 07:47:52 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ABE351956048; Wed, 4 Dec 2024 07:47:49 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 08/39] rxrpc: Use a large kvec[] in rxrpc_local rather than every rxrpc_txbuf Date: Wed, 4 Dec 2024 07:46:36 +0000 Message-ID: <20241204074710.990092-9-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Use a single large kvec[] in the rxrpc_local struct rather than one in every rxrpc_txbuf struct to build large packets to save on memory. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 6 ++++++ net/rxrpc/output.c | 45 ++++++++++++++++++++++++++++++----------- 2 files changed, 39 insertions(+), 12 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d0fd37bdcfe9..ab8e565cb20b 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -320,6 +320,12 @@ struct rxrpc_local { struct list_head new_client_calls; /* Newly created client calls need connection */ spinlock_t client_call_lock; /* Lock for ->new_client_calls */ struct sockaddr_rxrpc srx; /* local address */ + /* Provide a kvec table sufficiently large to manage either a DATA + * packet with a maximum set of jumbo subpackets or a PING ACK padded + * out to 64K with zeropages for PMTUD. + */ + struct kvec kvec[RXRPC_MAX_NR_JUMBO > 3 + 16 ? + RXRPC_MAX_NR_JUMBO : 3 + 16]; }; /* diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index b93a5d50be3e..f8bb5250e849 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -175,9 +175,11 @@ static void rxrpc_begin_rtt_probe(struct rxrpc_call *call, rxrpc_serial_t serial /* * Transmit an ACK packet. */ -static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) +static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb, + int nr_kv) { - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; + struct kvec *kv = call->local->kvec; + struct rxrpc_wire_header *whdr = kv[0].iov_base; struct rxrpc_connection *conn; struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); struct msghdr msg; @@ -206,8 +208,9 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t rxrpc_inc_stat(call->rxnet, stat_tx_ack_send); - iov_iter_kvec(&msg.msg_iter, WRITE, txb->kvec, txb->nr_kvec, txb->len); + iov_iter_kvec(&msg.msg_iter, WRITE, kv, nr_kv, txb->len); rxrpc_local_dont_fragment(conn->local, false); + ret = do_udp_sendmsg(conn->local->socket, &msg, txb->len); call->peer->last_tx_at = ktime_get_seconds(); if (ret < 0) { @@ -233,6 +236,8 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why) { struct rxrpc_txbuf *txb; + struct kvec *kv = call->local->kvec; + int nr_kv; if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) return; @@ -248,12 +253,19 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, txb->ack_why = why; rxrpc_fill_out_ack(call, txb, ack_reason, serial); + + nr_kv = txb->nr_kvec; + kv[0] = txb->kvec[0]; + kv[1] = txb->kvec[1]; + kv[2] = txb->kvec[2]; + // TODO: Extend a path MTU probe ACK + call->ackr_nr_unacked = 0; atomic_set(&call->ackr_nr_consumed, 0); clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags); trace_rxrpc_send_ack(call, why, ack_reason, serial); - rxrpc_send_ack_packet(call, txb); + rxrpc_send_ack_packet(call, txb, nr_kv); rxrpc_put_txbuf(txb, rxrpc_txbuf_put_ack_tx); } @@ -324,12 +336,15 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) /* * Prepare a (sub)packet for transmission. */ -static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, - rxrpc_serial_t serial) +static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, + rxrpc_serial_t serial, + int subpkt) { struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; enum rxrpc_req_ack_trace why; struct rxrpc_connection *conn = call->conn; + struct kvec *kv = &call->local->kvec[subpkt]; + size_t len = txb->len; bool last, more; u8 flags; @@ -385,8 +400,13 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t whdr->flags = flags; whdr->serial = htonl(txb->serial); whdr->cksum = txb->cksum; + whdr->serviceId = htons(conn->service_id); + kv->iov_base = whdr; + // TODO: Convert into a jumbo header for tail subpackets trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, false); + kv->iov_len = len; + return len; } /* @@ -395,13 +415,15 @@ static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_t static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) { rxrpc_serial_t serial; + size_t len = 0; /* Each transmission of a Tx packet needs a new serial number */ serial = rxrpc_get_next_serial(call->conn); - rxrpc_prepare_data_subpacket(call, txb, serial); + len += rxrpc_prepare_data_subpacket(call, txb, serial, 0); + // TODO: Loop around adding tail subpackets - return txb->len; + return len; } /* @@ -442,7 +464,6 @@ static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbu */ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) { - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; struct rxrpc_connection *conn = call->conn; enum rxrpc_tx_point frag; struct msghdr msg; @@ -463,7 +484,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t } } - iov_iter_kvec(&msg.msg_iter, WRITE, txb->kvec, txb->nr_kvec, len); + iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, 1, len); msg.msg_name = &call->peer->srx.transport; msg.msg_namelen = call->peer->srx.transport_len; @@ -480,7 +501,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t /* send the packet with the don't fragment bit set if we currently * think it's small enough */ - if (txb->len >= call->peer->maxdata) { + if (len >= sizeof(struct rxrpc_wire_header) + call->peer->maxdata) { rxrpc_local_dont_fragment(conn->local, false); frag = rxrpc_tx_point_call_data_frag; } else { @@ -503,7 +524,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t rxrpc_inc_stat(call->rxnet, stat_tx_data_send_fail); trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, frag); } else { - trace_rxrpc_tx_packet(call->debug_id, whdr, frag); + trace_rxrpc_tx_packet(call->debug_id, call->local->kvec[0].iov_base, frag); } rxrpc_tx_backoff(call, ret); From patchwork Wed Dec 4 07:46:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893326 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2002194AD1 for ; Wed, 4 Dec 2024 07:48:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298484; cv=none; b=msm3PRjf9y0hUyl8iY96X8H6Y68Y2dcPQ/dW9I2UahvLAP6VCkv2Ci9PxSx4J+C/zEWhcnQafdMPxuj4A/ze/M5OoDaPK2WJ7JREzIu4ireYXKZH2UWvsh0qi030JM8A/LH5RPW/x38ehXo5s8NvcGGIBozYlLm8v8KshUkUZZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298484; c=relaxed/simple; bh=ERhpNtzTSLD2dNlENwvqGfcYQMeVZa5IJsu85ueLYfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LtTkwsuvbgNO21OV4WpOQ5xP1ZLlX7TxsfC/Idz9wHKj4snielloyFNx2UtWyH9mzXFGwJLv0ZpPk1GXnhhwmVt6P9qsHpCWseKKPjM3PIo24ctnkrHsL7EtxtJDWdTpdWdXz2sBZAFXexLQe57QMIzvc4oqPPWSsHmKaB6mWUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DNIp2k8K; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DNIp2k8K" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298481; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L4d/Yg22/RPf8ul5z22PGp96r7t8bR7TSqx1ZfKV5RQ=; b=DNIp2k8KG64yyMF6l9YvD5VnnDvOGzcGiMLk+zXZWE+qHup+J/3KgXW0G5u3j05n8kgo+n KfZdGpFGU4Z6KxnTTT6hGyj/CLkaB6M+Ao22kEx1z9MxG40kheEkDlv17drCuMU0/gHnYb Od4/nuEq/sf/YKPdxCx+0bb/nWet4P8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-xDxTqVLbPe-Pu0gSYnq1QA-1; Wed, 04 Dec 2024 02:47:57 -0500 X-MC-Unique: xDxTqVLbPe-Pu0gSYnq1QA-1 X-Mimecast-MFC-AGG-ID: xDxTqVLbPe-Pu0gSYnq1QA Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5F8B91954ADD; Wed, 4 Dec 2024 07:47:56 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A03421956054; Wed, 4 Dec 2024 07:47:53 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 09/39] rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899) Date: Wed, 4 Dec 2024 07:46:37 +0000 Message-ID: <20241204074710.990092-10-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Implement path-MTU probing (along the lines of RFC8899) by padding some of the PING ACKs we send. PING ACKs get their own individual responses quite apart from the acking of data (though, as ACKs, they fulfil that role also). The probing concentrates on packet sizes that correspond how many subpackets can be stuffed inside a jumbo packet as jumbo DATA packets are just aggregations of individual DATA packets and can be split easily for retransmission purposes. If we want to perform probing, we advertise this by setting the maximum number of jumbo subpackets to 0 in the ack trailer when we send an ACK and see if the peer is also advertising the service. This is interpreted by non-supporting Rx stacks as an indication that jumbo packets aren't supported. The MTU sizes advertised in the ACK trailer AF_RXRPC transmits are pegged at a maximum of 1444 unless pmtud is supported by both sides. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 124 +++++++++++++++++++++++++++++++++++ net/rxrpc/ar-internal.h | 25 +++++-- net/rxrpc/call_event.c | 5 ++ net/rxrpc/conn_event.c | 17 +++-- net/rxrpc/conn_object.c | 6 ++ net/rxrpc/input.c | 26 +++++--- net/rxrpc/io_thread.c | 6 ++ net/rxrpc/misc.c | 4 +- net/rxrpc/output.c | 67 +++++++++++++++---- net/rxrpc/peer_event.c | 104 +++++++++++++++++++++++++++-- net/rxrpc/peer_object.c | 24 +++++-- net/rxrpc/proc.c | 9 +-- net/rxrpc/protocol.h | 13 ++-- net/rxrpc/sysctl.c | 6 +- net/rxrpc/txbuf.c | 3 +- 15 files changed, 382 insertions(+), 57 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index d86b5f07d292..9dcadad88e76 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -364,6 +364,7 @@ EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \ EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \ EM(rxrpc_propose_ack_ping_for_0_retrans, "0-Retrn") \ + EM(rxrpc_propose_ack_ping_for_mtu_probe, "MTUProb") \ EM(rxrpc_propose_ack_ping_for_old_rtt, "OldRtt ") \ EM(rxrpc_propose_ack_ping_for_params, "Params ") \ EM(rxrpc_propose_ack_ping_for_rtt, "Rtt ") \ @@ -478,6 +479,11 @@ EM(rxrpc_txbuf_see_send_more, "SEE SEND+ ") \ E_(rxrpc_txbuf_see_unacked, "SEE UNACKED") +#define rxrpc_pmtud_reduce_traces \ + EM(rxrpc_pmtud_reduce_ack, "Ack ") \ + EM(rxrpc_pmtud_reduce_icmp, "Icmp ") \ + E_(rxrpc_pmtud_reduce_route, "Route") + /* * Generate enums for tracing information. */ @@ -498,6 +504,7 @@ enum rxrpc_congest_change { rxrpc_congest_changes } __mode(byte); enum rxrpc_conn_trace { rxrpc_conn_traces } __mode(byte); enum rxrpc_local_trace { rxrpc_local_traces } __mode(byte); enum rxrpc_peer_trace { rxrpc_peer_traces } __mode(byte); +enum rxrpc_pmtud_reduce_trace { rxrpc_pmtud_reduce_traces } __mode(byte); enum rxrpc_propose_ack_outcome { rxrpc_propose_ack_outcomes } __mode(byte); enum rxrpc_propose_ack_trace { rxrpc_propose_ack_traces } __mode(byte); enum rxrpc_receive_trace { rxrpc_receive_traces } __mode(byte); @@ -534,6 +541,7 @@ rxrpc_congest_changes; rxrpc_congest_modes; rxrpc_conn_traces; rxrpc_local_traces; +rxrpc_pmtud_reduce_traces; rxrpc_propose_ack_traces; rxrpc_receive_traces; rxrpc_recvmsg_traces; @@ -2040,6 +2048,122 @@ TRACE_EVENT(rxrpc_sack, __entry->sack) ); +TRACE_EVENT(rxrpc_pmtud_tx, + TP_PROTO(struct rxrpc_call *call), + + TP_ARGS(call), + + TP_STRUCT__entry( + __field(unsigned int, peer_debug_id) + __field(unsigned int, call_debug_id) + __field(rxrpc_serial_t, ping_serial) + __field(unsigned short, pmtud_trial) + __field(unsigned short, pmtud_good) + __field(unsigned short, pmtud_bad) + ), + + TP_fast_assign( + __entry->peer_debug_id = call->peer->debug_id; + __entry->call_debug_id = call->debug_id; + __entry->ping_serial = call->conn->pmtud_probe; + __entry->pmtud_trial = call->peer->pmtud_trial; + __entry->pmtud_good = call->peer->pmtud_good; + __entry->pmtud_bad = call->peer->pmtud_bad; + ), + + TP_printk("P=%08x c=%08x pr=%08x %u-%u-%u", + __entry->peer_debug_id, + __entry->call_debug_id, + __entry->ping_serial, + __entry->pmtud_good, + __entry->pmtud_trial, + __entry->pmtud_bad) + ); + +TRACE_EVENT(rxrpc_pmtud_rx, + TP_PROTO(struct rxrpc_connection *conn, rxrpc_serial_t resp_serial), + + TP_ARGS(conn, resp_serial), + + TP_STRUCT__entry( + __field(unsigned int, peer_debug_id) + __field(unsigned int, call_debug_id) + __field(rxrpc_serial_t, ping_serial) + __field(rxrpc_serial_t, resp_serial) + __field(unsigned short, max_data) + __field(u8, jumbo_max) + ), + + TP_fast_assign( + __entry->peer_debug_id = conn->peer->debug_id; + __entry->call_debug_id = conn->pmtud_call; + __entry->ping_serial = conn->pmtud_probe; + __entry->resp_serial = resp_serial; + __entry->max_data = conn->peer->max_data; + __entry->jumbo_max = conn->peer->pmtud_jumbo; + ), + + TP_printk("P=%08x c=%08x pr=%08x rr=%08x max=%u jm=%u", + __entry->peer_debug_id, + __entry->call_debug_id, + __entry->ping_serial, + __entry->resp_serial, + __entry->max_data, + __entry->jumbo_max) + ); + +TRACE_EVENT(rxrpc_pmtud_lost, + TP_PROTO(struct rxrpc_connection *conn, rxrpc_serial_t resp_serial), + + TP_ARGS(conn, resp_serial), + + TP_STRUCT__entry( + __field(unsigned int, peer_debug_id) + __field(unsigned int, call_debug_id) + __field(rxrpc_serial_t, ping_serial) + __field(rxrpc_serial_t, resp_serial) + ), + + TP_fast_assign( + __entry->peer_debug_id = conn->peer->debug_id; + __entry->call_debug_id = conn->pmtud_call; + __entry->ping_serial = conn->pmtud_probe; + __entry->resp_serial = resp_serial; + ), + + TP_printk("P=%08x c=%08x pr=%08x rr=%08x", + __entry->peer_debug_id, + __entry->call_debug_id, + __entry->ping_serial, + __entry->resp_serial) + ); + +TRACE_EVENT(rxrpc_pmtud_reduce, + TP_PROTO(struct rxrpc_peer *peer, rxrpc_serial_t serial, + unsigned int max_data, enum rxrpc_pmtud_reduce_trace reason), + + TP_ARGS(peer, serial, max_data, reason), + + TP_STRUCT__entry( + __field(unsigned int, peer_debug_id) + __field(rxrpc_serial_t, serial) + __field(unsigned int, max_data) + __field(enum rxrpc_pmtud_reduce_trace, reason) + ), + + TP_fast_assign( + __entry->peer_debug_id = peer->debug_id; + __entry->serial = serial; + __entry->max_data = max_data; + __entry->reason = reason; + ), + + TP_printk("P=%08x %s r=%08x m=%u", + __entry->peer_debug_id, + __print_symbolic(__entry->reason, rxrpc_pmtud_reduce_traces), + __entry->serial, __entry->max_data) + ); + #undef EM #undef E_ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ab8e565cb20b..69e6f4b20bad 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -344,13 +344,25 @@ struct rxrpc_peer { time64_t last_tx_at; /* Last time packet sent here */ seqlock_t service_conn_lock; spinlock_t lock; /* access lock */ - unsigned int if_mtu; /* interface MTU for this peer */ - unsigned int mtu; /* network MTU for this peer */ - unsigned int maxdata; /* data size (MTU - hdrsize) */ - unsigned short hdrsize; /* header size (IP + UDP + RxRPC) */ int debug_id; /* debug ID for printks */ struct sockaddr_rxrpc srx; /* remote address */ + /* Path MTU discovery [RFC8899] */ + unsigned int pmtud_trial; /* Current MTU probe size */ + unsigned int pmtud_good; /* Largest working MTU probe we've tried */ + unsigned int pmtud_bad; /* Smallest non-working MTU probe we've tried */ + bool pmtud_lost; /* T if MTU probe was lost */ + bool pmtud_probing; /* T if we have an active probe outstanding */ + bool pmtud_pending; /* T if a call to this peer should send a probe */ + u8 pmtud_jumbo; /* Max jumbo packets for the MTU */ + bool ackr_adv_pmtud; /* T if the peer advertises path-MTU */ + unsigned int ackr_max_data; /* Maximum data advertised by peer */ + seqcount_t mtu_lock; /* Lockless MTU access management */ + unsigned int if_mtu; /* Local interface MTU (- hdrsize) for this peer */ + unsigned int max_data; /* Maximum packet data capacity for this peer */ + unsigned short hdrsize; /* header size (IP + UDP + RxRPC) */ + unsigned short tx_seg_max; /* Maximum number of transmissable segments */ + /* calculated RTT cache */ #define RXRPC_RTT_CACHE_SIZE 32 spinlock_t rtt_input_lock; /* RTT lock for input routine */ @@ -531,6 +543,8 @@ struct rxrpc_connection { int debug_id; /* debug ID for printks */ rxrpc_serial_t tx_serial; /* Outgoing packet serial number counter */ unsigned int hi_serial; /* highest serial number received */ + rxrpc_serial_t pmtud_probe; /* Serial of MTU probe (or 0) */ + unsigned int pmtud_call; /* ID of call used for probe */ u32 service_id; /* Service ID, possibly upgraded */ u32 security_level; /* Security level selected */ u8 security_ix; /* security type */ @@ -1155,6 +1169,7 @@ static inline struct rxrpc_net *rxrpc_net(struct net *net) */ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why); +void rxrpc_send_probe_for_pmtud(struct rxrpc_call *call); int rxrpc_send_abort_packet(struct rxrpc_call *); void rxrpc_send_conn_abort(struct rxrpc_connection *conn); void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb); @@ -1166,6 +1181,8 @@ void rxrpc_transmit_one(struct rxrpc_call *call, struct rxrpc_txbuf *txb); */ void rxrpc_input_error(struct rxrpc_local *, struct sk_buff *); void rxrpc_peer_keepalive_worker(struct work_struct *); +void rxrpc_input_probe_for_pmtud(struct rxrpc_connection *conn, rxrpc_serial_t acked_serial, + bool sendmsg_fail); /* * peer_object.c diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c4754cc9b8d4..1d889b6f0366 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -483,6 +483,11 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_disconnect_call(call); if (call->security) call->security->free_call_crypto(call); + } else { + if (skb && + call->peer->ackr_adv_pmtud && + call->peer->pmtud_pending) + rxrpc_send_probe_for_pmtud(call); } if (call->acks_hard_ack != call->tx_bottom) rxrpc_shrink_call_tx_buffer(call); diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 2a1396cd892f..f6c02cc44d98 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -92,7 +92,7 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, struct rxrpc_acktrailer trailer; size_t len; int ret, ioc; - u32 serial, mtu, call_id, padding; + u32 serial, max_mtu, if_mtu, call_id, padding; _enter("%d", conn->debug_id); @@ -150,8 +150,13 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, break; case RXRPC_PACKET_TYPE_ACK: - mtu = conn->peer->if_mtu; - mtu -= conn->peer->hdrsize; + if_mtu = conn->peer->if_mtu - conn->peer->hdrsize; + if (conn->peer->ackr_adv_pmtud) { + max_mtu = umax(conn->peer->max_data, rxrpc_rx_mtu); + } else { + if_mtu = umin(1444, if_mtu); + max_mtu = if_mtu; + } pkt.ack.bufferSpace = 0; pkt.ack.maxSkew = htons(skb ? skb->priority : 0); pkt.ack.firstPacket = htonl(chan->last_seq + 1); @@ -159,10 +164,10 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, pkt.ack.serial = htonl(skb ? sp->hdr.serial : 0); pkt.ack.reason = skb ? RXRPC_ACK_DUPLICATE : RXRPC_ACK_IDLE; pkt.ack.nAcks = 0; - trailer.maxMTU = htonl(rxrpc_rx_mtu); - trailer.ifMTU = htonl(mtu); + trailer.maxMTU = htonl(max_mtu); + trailer.ifMTU = htonl(if_mtu); trailer.rwind = htonl(rxrpc_rx_window_size); - trailer.jumbo_max = htonl(rxrpc_rx_jumbo_max); + trailer.jumbo_max = 0; pkt.whdr.flags |= RXRPC_SLOW_START_OK; padding = 0; iov[0].iov_len += sizeof(pkt.ack); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index 694c4df7a1a3..b0627398311b 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -321,6 +321,12 @@ static void rxrpc_clean_up_connection(struct work_struct *work) list_del_init(&conn->proc_link); write_unlock(&rxnet->conn_lock); + if (conn->pmtud_probe) { + trace_rxrpc_pmtud_lost(conn, 0); + conn->peer->pmtud_probing = false; + conn->peer->pmtud_pending = true; + } + rxrpc_purge_queue(&conn->rx_queue); rxrpc_kill_client_conn(conn); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 49e35be7dc13..fd08d813ef29 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -692,8 +692,8 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb struct rxrpc_acktrailer *trailer) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - struct rxrpc_peer *peer; - unsigned int mtu; + struct rxrpc_peer *peer = call->peer; + unsigned int max_data; bool wake = false; u32 rwind = ntohl(trailer->rwind); @@ -706,14 +706,22 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb call->tx_winsize = rwind; } - mtu = umin(ntohl(trailer->maxMTU), ntohl(trailer->ifMTU)); + if (trailer->jumbo_max == 0) { + /* The peer says it supports pmtu discovery */ + peer->ackr_adv_pmtud = true; + } else { + peer->ackr_adv_pmtud = false; + } + + max_data = ntohl(trailer->maxMTU); + peer->ackr_max_data = max_data; - peer = call->peer; - if (mtu < peer->maxdata) { - spin_lock(&peer->lock); - peer->maxdata = mtu; - peer->mtu = mtu + peer->hdrsize; - spin_unlock(&peer->lock); + if (max_data < peer->max_data) { + trace_rxrpc_pmtud_reduce(peer, sp->hdr.serial, max_data, + rxrpc_pmtud_reduce_ack); + write_seqcount_begin(&peer->mtu_lock); + peer->max_data = max_data; + write_seqcount_end(&peer->mtu_lock); } if (wake) diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index 7af5adf53b25..bd6d4f5e97b4 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -364,6 +364,12 @@ static int rxrpc_input_packet_on_conn(struct rxrpc_connection *conn, if (sp->hdr.callNumber == 0) return rxrpc_input_conn_packet(conn, skb); + /* Deal with path MTU discovery probing. */ + if (sp->hdr.type == RXRPC_PACKET_TYPE_ACK && + conn->pmtud_probe && + after_eq(sp->ack.acked_serial, conn->pmtud_probe)) + rxrpc_input_probe_for_pmtud(conn, sp->ack.acked_serial, false); + /* Call-bound packets are routed by connection channel. */ channel = sp->hdr.cid & RXRPC_CHANNELMASK; chan = &conn->channels[channel]; diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c index 657cf35089a6..8fcc8139d771 100644 --- a/net/rxrpc/misc.c +++ b/net/rxrpc/misc.c @@ -46,13 +46,13 @@ unsigned int rxrpc_rx_window_size = 255; * Maximum Rx MTU size. This indicates to the sender the size of jumbo packet * made by gluing normal packets together that we're willing to handle. */ -unsigned int rxrpc_rx_mtu = 5692; +unsigned int rxrpc_rx_mtu = RXRPC_JUMBO(46); /* * The maximum number of fragments in a received jumbo packet that we tell the * sender that we're willing to handle. */ -unsigned int rxrpc_rx_jumbo_max = 4; +unsigned int rxrpc_rx_jumbo_max = 46; #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY /* diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index f8bb5250e849..a91be871ad96 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -82,10 +82,9 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; struct rxrpc_acktrailer *trailer = txb->kvec[2].iov_base + 3; struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); - unsigned int qsize, sack, wrap, to; + unsigned int qsize, sack, wrap, to, max_mtu, if_mtu; rxrpc_seq_t window, wtop; int rsize; - u32 mtu, jmax; u8 *filler = txb->kvec[2].iov_base; u8 *sackp = txb->kvec[1].iov_base; @@ -132,16 +131,22 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, ack->reason = RXRPC_ACK_IDLE; } - mtu = call->peer->if_mtu; - mtu -= call->peer->hdrsize; - jmax = rxrpc_rx_jumbo_max; qsize = (window - 1) - call->rx_consumed; rsize = max_t(int, call->rx_winsize - qsize, 0); txb->ack_rwind = rsize; - trailer->maxMTU = htonl(rxrpc_rx_mtu); - trailer->ifMTU = htonl(mtu); + + if_mtu = call->peer->if_mtu - call->peer->hdrsize; + if (call->peer->ackr_adv_pmtud) { + max_mtu = umax(call->peer->max_data, rxrpc_rx_mtu); + } else { + if_mtu = umin(if_mtu, 1444); + max_mtu = if_mtu; + } + + trailer->maxMTU = htonl(max_mtu); + trailer->ifMTU = htonl(if_mtu); trailer->rwind = htonl(rsize); - trailer->jumbo_max = htonl(jmax); + trailer->jumbo_max = 0; /* Advertise pmtu discovery */ } /* @@ -176,7 +181,7 @@ static void rxrpc_begin_rtt_probe(struct rxrpc_call *call, rxrpc_serial_t serial * Transmit an ACK packet. */ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb, - int nr_kv) + int nr_kv, enum rxrpc_propose_ack_trace why) { struct kvec *kv = call->local->kvec; struct rxrpc_wire_header *whdr = kv[0].iov_base; @@ -209,13 +214,16 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t rxrpc_inc_stat(call->rxnet, stat_tx_ack_send); iov_iter_kvec(&msg.msg_iter, WRITE, kv, nr_kv, txb->len); - rxrpc_local_dont_fragment(conn->local, false); + rxrpc_local_dont_fragment(conn->local, why == rxrpc_propose_ack_ping_for_mtu_probe); ret = do_udp_sendmsg(conn->local->socket, &msg, txb->len); call->peer->last_tx_at = ktime_get_seconds(); if (ret < 0) { trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, rxrpc_tx_point_call_ack); + if (why == rxrpc_propose_ack_ping_for_mtu_probe && + ret == -EMSGSIZE) + rxrpc_input_probe_for_pmtud(conn, txb->serial, true); } else { trace_rxrpc_tx_packet(call->debug_id, whdr, rxrpc_tx_point_call_ack); @@ -225,6 +233,13 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t if (txb->flags & RXRPC_REQUEST_ACK) call->peer->rtt_last_req = now; rxrpc_set_keepalive(call, now); + if (why == rxrpc_propose_ack_ping_for_mtu_probe) { + call->peer->pmtud_pending = false; + call->peer->pmtud_probing = true; + call->conn->pmtud_probe = txb->serial; + call->conn->pmtud_call = call->debug_id; + trace_rxrpc_pmtud_tx(call); + } } rxrpc_tx_backoff(call, ret); } @@ -254,21 +269,45 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, rxrpc_fill_out_ack(call, txb, ack_reason, serial); + /* Extend a path MTU probe ACK. */ nr_kv = txb->nr_kvec; kv[0] = txb->kvec[0]; kv[1] = txb->kvec[1]; kv[2] = txb->kvec[2]; - // TODO: Extend a path MTU probe ACK + if (why == rxrpc_propose_ack_ping_for_mtu_probe) { + size_t probe_mtu = call->peer->pmtud_trial + sizeof(struct rxrpc_wire_header); + + if (txb->len > probe_mtu) + goto skip; + while (txb->len < probe_mtu) { + size_t part = umin(probe_mtu - txb->len, PAGE_SIZE); + + kv[nr_kv].iov_base = page_address(ZERO_PAGE(0)); + kv[nr_kv].iov_len = part; + txb->len += part; + nr_kv++; + } + } call->ackr_nr_unacked = 0; atomic_set(&call->ackr_nr_consumed, 0); clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags); trace_rxrpc_send_ack(call, why, ack_reason, serial); - rxrpc_send_ack_packet(call, txb, nr_kv); + rxrpc_send_ack_packet(call, txb, nr_kv, why); +skip: rxrpc_put_txbuf(txb, rxrpc_txbuf_put_ack_tx); } +/* + * Send an ACK probe for path MTU discovery. + */ +void rxrpc_send_probe_for_pmtud(struct rxrpc_call *call) +{ + rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, + rxrpc_propose_ack_ping_for_mtu_probe); +} + /* * Send an ABORT call packet. */ @@ -501,7 +540,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t /* send the packet with the don't fragment bit set if we currently * think it's small enough */ - if (len >= sizeof(struct rxrpc_wire_header) + call->peer->maxdata) { + if (len >= sizeof(struct rxrpc_wire_header) + call->peer->max_data) { rxrpc_local_dont_fragment(conn->local, false); frag = rxrpc_tx_point_call_data_frag; } else { @@ -548,7 +587,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t RX_USER_ABORT, ret); } - _leave(" = %d [%u]", ret, call->peer->maxdata); + _leave(" = %d [%u]", ret, call->peer->max_data); return ret; } diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index 552ba84a255c..8fc9464a960c 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -102,6 +102,8 @@ static struct rxrpc_peer *rxrpc_lookup_peer_local_rcu(struct rxrpc_local *local, */ static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, unsigned int mtu) { + unsigned int max_data; + /* wind down the local interface MTU */ if (mtu > 0 && peer->if_mtu == 65535 && mtu < peer->if_mtu) peer->if_mtu = mtu; @@ -120,11 +122,17 @@ static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, unsigned int mtu) } } - if (mtu < peer->mtu) { - spin_lock(&peer->lock); - peer->mtu = mtu; - peer->maxdata = peer->mtu - peer->hdrsize; - spin_unlock(&peer->lock); + max_data = max_t(int, mtu - peer->hdrsize, 500); + if (max_data < peer->max_data) { + if (peer->pmtud_good > max_data) + peer->pmtud_good = max_data; + if (peer->pmtud_bad > max_data + 1) + peer->pmtud_bad = max_data + 1; + + trace_rxrpc_pmtud_reduce(peer, 0, max_data, rxrpc_pmtud_reduce_icmp); + write_seqcount_begin(&peer->mtu_lock); + peer->max_data = max_data; + write_seqcount_end(&peer->mtu_lock); } } @@ -347,3 +355,89 @@ void rxrpc_peer_keepalive_worker(struct work_struct *work) _leave(""); } + +/* + * Do path MTU probing. + */ +void rxrpc_input_probe_for_pmtud(struct rxrpc_connection *conn, rxrpc_serial_t acked_serial, + bool sendmsg_fail) +{ + struct rxrpc_peer *peer = conn->peer; + unsigned int max_data = peer->max_data; + int good, trial, bad, jumbo; + + good = peer->pmtud_good; + trial = peer->pmtud_trial; + bad = peer->pmtud_bad; + if (good >= bad - 1) { + conn->pmtud_probe = 0; + peer->pmtud_lost = false; + return; + } + + if (!peer->pmtud_probing) + goto send_probe; + + if (sendmsg_fail || after(acked_serial, conn->pmtud_probe)) { + /* Retry a lost probe. */ + if (!peer->pmtud_lost) { + trace_rxrpc_pmtud_lost(conn, acked_serial); + conn->pmtud_probe = 0; + peer->pmtud_lost = true; + goto send_probe; + } + + /* The probed size didn't seem to get through. */ + bad = trial; + peer->pmtud_bad = bad; + if (bad <= max_data) + max_data = bad - 1; + } else { + /* It did get through. */ + good = trial; + peer->pmtud_good = good; + if (good > max_data) + max_data = good; + } + + max_data = umin(max_data, peer->ackr_max_data); + if (max_data != peer->max_data) { + preempt_disable(); + write_seqcount_begin(&peer->mtu_lock); + peer->max_data = max_data; + write_seqcount_end(&peer->mtu_lock); + preempt_enable(); + } + + jumbo = max_data + sizeof(struct rxrpc_jumbo_header); + jumbo /= RXRPC_JUMBO_SUBPKTLEN; + peer->pmtud_jumbo = jumbo; + + trace_rxrpc_pmtud_rx(conn, acked_serial); + conn->pmtud_probe = 0; + peer->pmtud_lost = false; + + if (good < RXRPC_JUMBO(2) && bad > RXRPC_JUMBO(2)) + trial = RXRPC_JUMBO(2); + else if (good < RXRPC_JUMBO(4) && bad > RXRPC_JUMBO(4)) + trial = RXRPC_JUMBO(4); + else if (good < RXRPC_JUMBO(3) && bad > RXRPC_JUMBO(3)) + trial = RXRPC_JUMBO(3); + else if (good < RXRPC_JUMBO(6) && bad > RXRPC_JUMBO(6)) + trial = RXRPC_JUMBO(6); + else if (good < RXRPC_JUMBO(5) && bad > RXRPC_JUMBO(5)) + trial = RXRPC_JUMBO(5); + else if (good < RXRPC_JUMBO(8) && bad > RXRPC_JUMBO(8)) + trial = RXRPC_JUMBO(8); + else if (good < RXRPC_JUMBO(7) && bad > RXRPC_JUMBO(7)) + trial = RXRPC_JUMBO(7); + else + trial = (good + bad) / 2; + peer->pmtud_trial = trial; + + if (good >= bad) + return; + +send_probe: + peer->pmtud_pending = true; +} diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 49dcda67a0d5..80ef6f06d512 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -162,6 +162,11 @@ static void rxrpc_assess_MTU_size(struct rxrpc_local *local, #endif peer->if_mtu = 1500; + if (peer->max_data < peer->if_mtu - peer->hdrsize) { + trace_rxrpc_pmtud_reduce(peer, 0, peer->if_mtu - peer->hdrsize, + rxrpc_pmtud_reduce_route); + peer->max_data = peer->if_mtu - peer->hdrsize; + } memset(&fl, 0, sizeof(fl)); switch (peer->srx.transport.family) { @@ -199,8 +204,16 @@ static void rxrpc_assess_MTU_size(struct rxrpc_local *local, } peer->if_mtu = dst_mtu(dst); + peer->hdrsize += dst->header_len + dst->trailer_len; + peer->tx_seg_max = dst->dev->gso_max_segs; dst_release(dst); + peer->max_data = umin(RXRPC_JUMBO(1), peer->if_mtu - peer->hdrsize); + peer->pmtud_good = 500; + peer->pmtud_bad = peer->if_mtu - peer->hdrsize + 1; + peer->pmtud_trial = umin(peer->max_data, peer->pmtud_bad - 1); + peer->pmtud_pending = true; + _leave(" [if_mtu %u]", peer->if_mtu); } @@ -223,6 +236,7 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp, seqlock_init(&peer->service_conn_lock); spin_lock_init(&peer->lock); spin_lock_init(&peer->rtt_input_lock); + seqcount_init(&peer->mtu_lock); peer->debug_id = atomic_inc_return(&rxrpc_debug_id); rxrpc_peer_init_rtt(peer); @@ -242,9 +256,7 @@ static void rxrpc_init_peer(struct rxrpc_local *local, struct rxrpc_peer *peer, unsigned long hash_key) { peer->hash_key = hash_key; - rxrpc_assess_MTU_size(local, peer); - peer->mtu = peer->if_mtu; - peer->rtt_last_req = ktime_get_real(); + switch (peer->srx.transport.family) { case AF_INET: @@ -268,7 +280,11 @@ static void rxrpc_init_peer(struct rxrpc_local *local, struct rxrpc_peer *peer, } peer->hdrsize += sizeof(struct rxrpc_wire_header); - peer->maxdata = peer->mtu - peer->hdrsize; + peer->max_data = peer->if_mtu - peer->hdrsize; + + rxrpc_assess_MTU_size(local, peer); + + peer->rtt_last_req = ktime_get_real(); } /* diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index ce4d48bdfbe9..44722c226064 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -283,9 +283,7 @@ static int rxrpc_peer_seq_show(struct seq_file *seq, void *v) if (v == SEQ_START_TOKEN) { seq_puts(seq, - "Proto Local " - " Remote " - " Use SST MTU LastUse RTT RTO\n" + "Proto Local Remote Use SST Maxd LastUse RTT RTO\n" ); return 0; } @@ -298,13 +296,12 @@ static int rxrpc_peer_seq_show(struct seq_file *seq, void *v) now = ktime_get_seconds(); seq_printf(seq, - "UDP %-47.47s %-47.47s %3u" - " %3u %5u %6llus %8u %8u\n", + "UDP %-47.47s %-47.47s %3u %4u %5u %6llus %8u %8u\n", lbuff, rbuff, refcount_read(&peer->ref), peer->cong_ssthresh, - peer->mtu, + peer->max_data, now - peer->last_tx_at, peer->srtt_us >> 3, peer->rto_us); diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h index 4fe6b4d20ada..42f70e4636f8 100644 --- a/net/rxrpc/protocol.h +++ b/net/rxrpc/protocol.h @@ -92,11 +92,16 @@ struct rxrpc_jumbo_header { /* * The maximum number of subpackets that can possibly fit in a UDP packet is: * - * ((max_IP - IP_hdr - UDP_hdr) / RXRPC_JUMBO_SUBPKTLEN) + 1 - * = ((65535 - 28 - 28) / 1416) + 1 - * = 46 non-terminal packets and 1 terminal packet. + * (max_UDP - wirehdr + jumbohdr) / (jumbohdr + 1412) + * = ((65535 - 28 + 4) / 1416) + * = 45 non-terminal packets and 1 terminal packet. */ -#define RXRPC_MAX_NR_JUMBO 47 +#define RXRPC_MAX_NR_JUMBO 46 + +/* Size of a jumbo packet with N subpackets, excluding UDP+IP */ +#define RXRPC_JUMBO(N) ((int)sizeof(struct rxrpc_wire_header) + \ + RXRPC_JUMBO_DATALEN + \ + ((N) - 1) * RXRPC_JUMBO_SUBPKTLEN) /*****************************************************************************/ /* diff --git a/net/rxrpc/sysctl.c b/net/rxrpc/sysctl.c index 9bf9a1f6e4cb..46a20cf4c402 100644 --- a/net/rxrpc/sysctl.c +++ b/net/rxrpc/sysctl.c @@ -11,6 +11,8 @@ #include "ar-internal.h" static struct ctl_table_header *rxrpc_sysctl_reg_table; +static const unsigned int rxrpc_rx_mtu_min = 500; +static const unsigned int rxrpc_jumbo_max = RXRPC_MAX_NR_JUMBO; static const unsigned int four = 4; static const unsigned int max_backlog = RXRPC_BACKLOG_MAX - 1; static const unsigned int n_65535 = 65535; @@ -115,7 +117,7 @@ static struct ctl_table rxrpc_sysctl_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)SYSCTL_ONE, + .extra1 = (void *)&rxrpc_rx_mtu_min, .extra2 = (void *)&n_65535, }, { @@ -125,7 +127,7 @@ static struct ctl_table rxrpc_sysctl_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = (void *)SYSCTL_ONE, - .extra2 = (void *)&four, + .extra2 = (void *)&rxrpc_jumbo_max, }, }; diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index c3913d8a50d3..2a4291617d40 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -179,7 +179,8 @@ static void rxrpc_free_txbuf(struct rxrpc_txbuf *txb) trace_rxrpc_txbuf(txb->debug_id, txb->call_debug_id, txb->seq, 0, rxrpc_txbuf_free); for (i = 0; i < txb->nr_kvec; i++) - if (txb->kvec[i].iov_base) + if (txb->kvec[i].iov_base && + !is_zero_pfn(page_to_pfn(virt_to_page(txb->kvec[i].iov_base)))) page_frag_free(txb->kvec[i].iov_base); kfree(txb); atomic_dec(&rxrpc_nr_txbuf); From patchwork Wed Dec 4 07:46:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893327 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E69F190462 for ; Wed, 4 Dec 2024 07:48:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298489; cv=none; b=m5HhIhs961zbq5foCN3k77Go9WeNmmxApqhzLFi/cni4zXe/MDkJQGXGdg8eri/r5GxMl6+7YutonWuk7CUJc18tIi3290EpaEFaRPQUgQxyNxVjmti9sfcKlQSuWXKH0DCUdh5zfyBdB/3cNgbYsdz65cjnPUMO1UROmGYT36E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298489; c=relaxed/simple; bh=zTMR4ciaXX8z8BIXCn8qa/xEjaRm9tVcN1NCX2QeVl4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EpDbqCGHCqqY5pOrpAXxyv8bDif/eoSrBItMhqb9W4RXmYEAztjFaa3pWXBPF7DqHzVl+bYRaFpLXQQK7InasmAj1fyehkWV73El/Hg5X/8gcivLHd7xoU5gSIXF11ZiuDHA1u4o5VL2gOtaYo8MElteIf+hKvrUdk89Rksvg/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=R7qAy+MG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="R7qAy+MG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hgT0KaAv7Mulncn7BWSw8qo4G9KhUray8DDjrIXNIWU=; b=R7qAy+MG669lp6AjYrtQIubNetia0VpwBBy7lWRieq1s0UuE4h3diULarXaNhk0AOOaGJt Zxs2DudZ5IDULH3Ctm4QbV9AkYl6WjRaaUNYHYW5Bxv7VYYZQu8QAqy6F7zYQoGNC2avVT f5owqRL+a5qaDfI0HI6FmlmetcPwcIs= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-646-8acrGF5jNoW0SGWfDxNbIA-1; Wed, 04 Dec 2024 02:48:02 -0500 X-MC-Unique: 8acrGF5jNoW0SGWfDxNbIA-1 X-Mimecast-MFC-AGG-ID: 8acrGF5jNoW0SGWfDxNbIA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7AEF91955E7A; Wed, 4 Dec 2024 07:48:00 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B3AB41956089; Wed, 4 Dec 2024 07:47:57 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 10/39] rxrpc: Separate the packet length from the data length in rxrpc_txbuf Date: Wed, 4 Dec 2024 07:46:38 +0000 Message-ID: <20241204074710.990092-11-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Separate the packet length from the data length (txb->len) stored in the rxrpc_txbuf to make security calculations easier. Also store the allocation size as that's an upper bound on the size of the security wrapper and change a number of fields to unsigned short as the amount of data can't exceed the capacity of a UDP packet. Also, whilst we're at it, use kzalloc() for txbufs. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 8 +++++--- net/rxrpc/insecure.c | 1 + net/rxrpc/output.c | 7 ++++--- net/rxrpc/rxkad.c | 44 ++++++++++++++++++++++------------------- net/rxrpc/sendmsg.c | 1 - net/rxrpc/txbuf.c | 7 ++----- 6 files changed, 36 insertions(+), 32 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 69e6f4b20bad..a5c0bc917641 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -821,9 +821,11 @@ struct rxrpc_txbuf { rxrpc_serial_t serial; /* Last serial number transmitted with */ unsigned int call_debug_id; unsigned int debug_id; - unsigned int len; /* Amount of data in buffer */ - unsigned int space; /* Remaining data space */ - unsigned int offset; /* Offset of fill point */ + unsigned short len; /* Amount of data in buffer */ + unsigned short space; /* Remaining data space */ + unsigned short offset; /* Offset of fill point */ + unsigned short pkt_len; /* Size of packet content */ + unsigned short alloc_size; /* Amount of bufferage allocated */ unsigned int flags; #define RXRPC_TXBUF_WIRE_FLAGS 0xff /* The wire protocol flags */ #define RXRPC_TXBUF_RESENT 0x100 /* Set if has been resent */ diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c index 751eb621021d..d665f486be5f 100644 --- a/net/rxrpc/insecure.c +++ b/net/rxrpc/insecure.c @@ -24,6 +24,7 @@ static struct rxrpc_txbuf *none_alloc_txbuf(struct rxrpc_call *call, size_t rema static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) { + txb->pkt_len = txb->len; return 0; } diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index a91be871ad96..df9af4ad4260 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -383,11 +383,11 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc enum rxrpc_req_ack_trace why; struct rxrpc_connection *conn = call->conn; struct kvec *kv = &call->local->kvec[subpkt]; - size_t len = txb->len; + size_t len = txb->pkt_len; bool last, more; u8 flags; - _enter("%x,{%d}", txb->seq, txb->len); + _enter("%x,%zd", txb->seq, len); txb->serial = serial; @@ -441,6 +441,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc whdr->cksum = txb->cksum; whdr->serviceId = htons(conn->service_id); kv->iov_base = whdr; + len += sizeof(*whdr); // TODO: Convert into a jumbo header for tail subpackets trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, false); @@ -509,7 +510,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t size_t len; int ret; - _enter("%x,{%d}", txb->seq, txb->len); + _enter("%x,{%d}", txb->seq, txb->pkt_len); len = rxrpc_prepare_data_packet(call, txb); diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index e3194d73dd84..755897fab626 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -148,14 +148,14 @@ static int rxkad_init_connection_security(struct rxrpc_connection *conn, static struct rxrpc_txbuf *rxkad_alloc_txbuf(struct rxrpc_call *call, size_t remain, gfp_t gfp) { struct rxrpc_txbuf *txb; - size_t shdr, space; + size_t shdr, alloc, limit, part; remain = umin(remain, 65535 - sizeof(struct rxrpc_wire_header)); switch (call->conn->security_level) { default: - space = umin(remain, RXRPC_JUMBO_DATALEN); - return rxrpc_alloc_data_txbuf(call, space, 1, gfp); + alloc = umin(remain, RXRPC_JUMBO_DATALEN); + return rxrpc_alloc_data_txbuf(call, alloc, 1, gfp); case RXRPC_SECURITY_AUTH: shdr = sizeof(struct rxkad_level1_hdr); break; @@ -164,15 +164,21 @@ static struct rxrpc_txbuf *rxkad_alloc_txbuf(struct rxrpc_call *call, size_t rem break; } - space = umin(round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN), remain + shdr); - space = round_up(space, RXKAD_ALIGN); + limit = round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN) - shdr; + if (remain < limit) { + part = remain; + alloc = round_up(shdr + part, RXKAD_ALIGN); + } else { + part = limit; + alloc = RXRPC_JUMBO_DATALEN; + } - txb = rxrpc_alloc_data_txbuf(call, space, RXKAD_ALIGN, gfp); + txb = rxrpc_alloc_data_txbuf(call, alloc, RXKAD_ALIGN, gfp); if (!txb) return NULL; txb->offset += shdr; - txb->space -= shdr; + txb->space = part; return txb; } @@ -263,13 +269,13 @@ static int rxkad_secure_packet_auth(const struct rxrpc_call *call, check = txb->seq ^ call->call_id; hdr->data_size = htonl((u32)check << 16 | txb->len); - txb->len += sizeof(struct rxkad_level1_hdr); - pad = txb->len; + txb->pkt_len = sizeof(struct rxkad_level1_hdr) + txb->len; + pad = txb->pkt_len; pad = RXKAD_ALIGN - pad; pad &= RXKAD_ALIGN - 1; if (pad) { memset(txb->kvec[0].iov_base + txb->offset, 0, pad); - txb->len += pad; + txb->pkt_len += pad; } /* start the encryption afresh */ @@ -298,7 +304,7 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call, struct rxkad_level2_hdr *rxkhdr = (void *)(whdr + 1); struct rxrpc_crypt iv; struct scatterlist sg; - size_t pad; + size_t content, pad; u16 check; int ret; @@ -309,23 +315,20 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call, rxkhdr->data_size = htonl(txb->len | (u32)check << 16); rxkhdr->checksum = 0; - txb->len += sizeof(struct rxkad_level2_hdr); - pad = txb->len; - pad = RXKAD_ALIGN - pad; - pad &= RXKAD_ALIGN - 1; - if (pad) { + content = sizeof(struct rxkad_level2_hdr) + txb->len; + txb->pkt_len = round_up(content, RXKAD_ALIGN); + pad = txb->pkt_len - content; + if (pad) memset(txb->kvec[0].iov_base + txb->offset, 0, pad); - txb->len += pad; - } /* encrypt from the session key */ token = call->conn->key->payload.data[0]; memcpy(&iv, token->kad->session_key, sizeof(iv)); - sg_init_one(&sg, rxkhdr, txb->len); + sg_init_one(&sg, rxkhdr, txb->pkt_len); skcipher_request_set_sync_tfm(req, call->conn->rxkad.cipher); skcipher_request_set_callback(req, 0, NULL, NULL); - skcipher_request_set_crypt(req, &sg, &sg, txb->len, iv.x); + skcipher_request_set_crypt(req, &sg, &sg, txb->pkt_len, iv.x); ret = crypto_skcipher_encrypt(req); skcipher_request_zero(req); return ret; @@ -384,6 +387,7 @@ static int rxkad_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) switch (call->conn->security_level) { case RXRPC_SECURITY_PLAIN: + txb->pkt_len = txb->len; ret = 0; break; case RXRPC_SECURITY_AUTH: diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 546abb463c3f..786c1fb1369a 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -391,7 +391,6 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, goto out; txb->kvec[0].iov_len += txb->len; - txb->len = txb->kvec[0].iov_len; rxrpc_queue_packet(rx, call, txb, notify_end_tx); txb = NULL; } diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index 2a4291617d40..8b7c854ed3d7 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -24,7 +24,7 @@ struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_ size_t total, hoff; void *buf; - txb = kmalloc(sizeof(*txb), gfp); + txb = kzalloc(sizeof(*txb), gfp); if (!txb) return NULL; @@ -49,14 +49,11 @@ struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_ txb->last_sent = KTIME_MIN; txb->call_debug_id = call->debug_id; txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); + txb->alloc_size = data_size; txb->space = data_size; - txb->len = 0; txb->offset = sizeof(*whdr); txb->flags = call->conn->out_clientflag; - txb->ack_why = 0; txb->seq = call->tx_prepared + 1; - txb->serial = 0; - txb->cksum = 0; txb->nr_kvec = 1; txb->kvec[0].iov_base = whdr; txb->kvec[0].iov_len = sizeof(*whdr); From patchwork Wed Dec 4 07:46:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893328 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BF601990C3 for ; Wed, 4 Dec 2024 07:48:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298492; cv=none; b=h1padyWlSfBvqKpM8Z5QU+YespMQkuFRmK2VC9kDwrklk9s1qQyRYlCmKGgbPd60lIo7HAVuV0idW1iJqaVdgNSA2rGJBiHTbaJgDqup4vf68iXbiVHIV3ozW/UAwahVrCro85r2QBIs5y/83XO2ji6cuWM24/X3POmsDdW9aic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298492; c=relaxed/simple; bh=n/5FEPSw3RltMUV6meGQElL1N329Et0snCOIZtteAiA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JfyivESbnTHgpi44tATf1R6TZODUaZPeX86BiIcP44/TMnMwmrB7xS9bIBd9FUZ4C8z9UdagGF/OKgvWloV/Vu3gqZONtx3x3ctaRyygqGCd3hHxLuv37m6zve9VT1NGLk0Cu+KfFh0WnPw7rd4IamCJLFhynGWWTbr8wwtVpL0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KShnLGJQ; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KShnLGJQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8V9nHq6BOqEnFEMXuSRwtnfpTe/wEsS7864WRwanHKE=; b=KShnLGJQ089apXiJdtIuV8B7zOhIpZ2NaY2UYAiYYuEMbgBwhVlI9PtqeR1nou0epzf4y0 Ns74KQXh7ntugOUfAqYxcp6lUADwehlwJZI4Se1Qxh/Nz3lUojX9NwA3Dbq5cWikRLur4P Io3qyKHJL+RNBmC8FtwqKj8iRRH2ffk= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-56-wLFymOr-M-qYa-hEo99g9A-1; Wed, 04 Dec 2024 02:48:06 -0500 X-MC-Unique: wLFymOr-M-qYa-hEo99g9A-1 X-Mimecast-MFC-AGG-ID: wLFymOr-M-qYa-hEo99g9A Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A2C7C1955D83; Wed, 4 Dec 2024 07:48:04 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E9C1D1956054; Wed, 4 Dec 2024 07:48:01 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 11/39] rxrpc: Prepare to be able to send jumbo DATA packets Date: Wed, 4 Dec 2024 07:46:39 +0000 Message-ID: <20241204074710.990092-12-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Prepare to be able to send jumbo DATA packets if the we decide to, but don't enable that yet. This will allow larger chunks of data to be sent without reducing the retryability as the subpackets in a jumbo packet can also be retransmitted individually. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 18 +++++++++- net/rxrpc/call_event.c | 48 ++++++++++++++----------- net/rxrpc/input.c | 36 +++++++++++-------- net/rxrpc/insecure.c | 2 ++ net/rxrpc/output.c | 80 ++++++++++++++++++++++++++++------------- net/rxrpc/rxkad.c | 13 +++++++ 6 files changed, 137 insertions(+), 60 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a5c0bc917641..4386b2e6cca5 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -832,6 +832,7 @@ struct rxrpc_txbuf { __be16 cksum; /* Checksum to go in header */ unsigned short ack_rwind; /* ACK receive window */ u8 /*enum rxrpc_propose_ack_trace*/ ack_why; /* If ack, why */ + bool jumboable; /* Can be non-terminal jumbo subpacket */ u8 nr_kvec; /* Amount of kvec[] used */ struct kvec kvec[3]; }; @@ -862,6 +863,21 @@ static inline rxrpc_serial_t rxrpc_get_next_serial(struct rxrpc_connection *conn return serial; } +/* + * Allocate the next serial n numbers on a connection. 0 must be skipped. + */ +static inline rxrpc_serial_t rxrpc_get_next_serials(struct rxrpc_connection *conn, + unsigned int n) +{ + rxrpc_serial_t serial; + + serial = conn->tx_serial; + if (serial + n <= n) + serial = 1; + conn->tx_serial = serial + n; + return serial; +} + /* * af_rxrpc.c */ @@ -1176,7 +1192,7 @@ int rxrpc_send_abort_packet(struct rxrpc_call *); void rxrpc_send_conn_abort(struct rxrpc_connection *conn); void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb); void rxrpc_send_keepalive(struct rxrpc_peer *); -void rxrpc_transmit_one(struct rxrpc_call *call, struct rxrpc_txbuf *txb); +void rxrpc_transmit_data(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n); /* * peer_event.c diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 1d889b6f0366..3379adfaaf65 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -124,7 +124,7 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) ktime_sub(resend_at, now)); txb->flags |= RXRPC_TXBUF_RESENT; - rxrpc_transmit_one(call, txb); + rxrpc_transmit_data(call, txb, 1); did_send = true; now = ktime_get_real(); @@ -164,7 +164,7 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) unacked = true; txb->flags |= RXRPC_TXBUF_RESENT; - rxrpc_transmit_one(call, txb); + rxrpc_transmit_data(call, txb, 1); did_send = true; rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); now = ktime_get_real(); @@ -231,15 +231,12 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call) } } -static bool rxrpc_tx_window_has_space(struct rxrpc_call *call) +static unsigned int rxrpc_tx_window_space(struct rxrpc_call *call) { - unsigned int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); - rxrpc_seq_t window = call->acks_hard_ack, wtop = window + winsize; - rxrpc_seq_t tx_top = call->tx_top; - int space; + int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); + int in_flight = call->tx_top - call->acks_hard_ack; - space = wtop - tx_top; - return space > 0; + return max(winsize - in_flight, 0); } /* @@ -247,7 +244,7 @@ static bool rxrpc_tx_window_has_space(struct rxrpc_call *call) */ static void rxrpc_decant_prepared_tx(struct rxrpc_call *call) { - struct rxrpc_txbuf *txb; + int space = rxrpc_tx_window_space(call); if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { if (list_empty(&call->tx_sendmsg)) @@ -255,22 +252,33 @@ static void rxrpc_decant_prepared_tx(struct rxrpc_call *call) rxrpc_expose_client_call(call); } - while ((txb = list_first_entry_or_null(&call->tx_sendmsg, - struct rxrpc_txbuf, call_link))) { + while (space > 0) { + struct rxrpc_txbuf *head = NULL, *txb; + int count = 0, limit = min(space, 1); + + if (list_empty(&call->tx_sendmsg)) + break; + spin_lock(&call->tx_lock); - list_del(&txb->call_link); + do { + txb = list_first_entry(&call->tx_sendmsg, + struct rxrpc_txbuf, call_link); + if (!head) + head = txb; + list_move_tail(&txb->call_link, &call->tx_buffer); + count++; + if (!txb->jumboable) + break; + } while (count < limit && !list_empty(&call->tx_sendmsg)); + spin_unlock(&call->tx_lock); call->tx_top = txb->seq; - list_add_tail(&txb->call_link, &call->tx_buffer); - if (txb->flags & RXRPC_LAST_PACKET) rxrpc_close_tx_phase(call); - rxrpc_transmit_one(call, txb); - - if (!rxrpc_tx_window_has_space(call)) - break; + space -= count; + rxrpc_transmit_data(call, head, count); } } @@ -285,7 +293,7 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call) case RXRPC_CALL_SERVER_SEND_REPLY: case RXRPC_CALL_CLIENT_SEND_REQUEST: - if (!rxrpc_tx_window_has_space(call)) + if (!rxrpc_tx_window_space(call)) return; if (list_empty(&call->tx_sendmsg)) { rxrpc_inc_stat(call->rxnet, stat_tx_data_underflow); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index fd08d813ef29..8398fa10ee8d 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -693,9 +693,12 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct rxrpc_peer *peer = call->peer; - unsigned int max_data; + unsigned int max_data, capacity; bool wake = false; - u32 rwind = ntohl(trailer->rwind); + u32 max_mtu = ntohl(trailer->maxMTU); + //u32 if_mtu = ntohl(trailer->ifMTU); + u32 rwind = ntohl(trailer->rwind); + u32 jumbo_max = ntohl(trailer->jumbo_max); if (rwind > RXRPC_TX_MAX_WINDOW) rwind = RXRPC_TX_MAX_WINDOW; @@ -706,24 +709,29 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb call->tx_winsize = rwind; } - if (trailer->jumbo_max == 0) { - /* The peer says it supports pmtu discovery */ - peer->ackr_adv_pmtud = true; - } else { - peer->ackr_adv_pmtud = false; - } - - max_data = ntohl(trailer->maxMTU); - peer->ackr_max_data = max_data; + max_mtu = clamp(max_mtu, 500, 65535); + peer->ackr_max_data = max_mtu; - if (max_data < peer->max_data) { - trace_rxrpc_pmtud_reduce(peer, sp->hdr.serial, max_data, + if (max_mtu < peer->max_data) { + trace_rxrpc_pmtud_reduce(peer, sp->hdr.serial, max_mtu, rxrpc_pmtud_reduce_ack); write_seqcount_begin(&peer->mtu_lock); - peer->max_data = max_data; + peer->max_data = max_mtu; write_seqcount_end(&peer->mtu_lock); } + max_data = umin(max_mtu, peer->max_data); + capacity = max_data; + capacity += sizeof(struct rxrpc_jumbo_header); /* First subpacket has main hdr, not jumbo */ + capacity /= sizeof(struct rxrpc_jumbo_header) + RXRPC_JUMBO_DATALEN; + + if (jumbo_max == 0) { + /* The peer says it supports pmtu discovery */ + peer->ackr_adv_pmtud = true; + } else { + peer->ackr_adv_pmtud = false; + } + if (wake) wake_up(&call->waitq); } diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c index d665f486be5f..e068f9b79d02 100644 --- a/net/rxrpc/insecure.c +++ b/net/rxrpc/insecure.c @@ -25,6 +25,8 @@ static struct rxrpc_txbuf *none_alloc_txbuf(struct rxrpc_call *call, size_t rema static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) { txb->pkt_len = txb->len; + if (txb->len == RXRPC_JUMBO_DATALEN) + txb->jumboable = true; return 0; } diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index df9af4ad4260..aededdd474d7 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -377,9 +377,10 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) */ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, rxrpc_serial_t serial, - int subpkt) + int subpkt, int nr_subpkts) { struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; + struct rxrpc_jumbo_header *jumbo = (void *)(whdr + 1) - sizeof(*jumbo); enum rxrpc_req_ack_trace why; struct rxrpc_connection *conn = call->conn; struct kvec *kv = &call->local->kvec[subpkt]; @@ -399,6 +400,11 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; last = txb->flags & RXRPC_LAST_PACKET; + if (subpkt < nr_subpkts - 1) { + len = RXRPC_JUMBO_DATALEN; + goto dont_set_request_ack; + } + more = (!list_is_last(&txb->call_link, &call->tx_buffer) || !list_empty(&call->tx_sendmsg)); @@ -436,13 +442,25 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc } dont_set_request_ack: - whdr->flags = flags; - whdr->serial = htonl(txb->serial); - whdr->cksum = txb->cksum; - whdr->serviceId = htons(conn->service_id); - kv->iov_base = whdr; - len += sizeof(*whdr); - // TODO: Convert into a jumbo header for tail subpackets + /* The jumbo header overlays the wire header in the txbuf. */ + if (subpkt < nr_subpkts - 1) + flags |= RXRPC_JUMBO_PACKET; + else + flags &= ~RXRPC_JUMBO_PACKET; + if (subpkt == 0) { + whdr->flags = flags; + whdr->serial = htonl(txb->serial); + whdr->cksum = txb->cksum; + whdr->serviceId = htons(conn->service_id); + kv->iov_base = whdr; + len += sizeof(*whdr); + } else { + jumbo->flags = flags; + jumbo->pad = 0; + jumbo->cksum = txb->cksum; + kv->iov_base = jumbo; + len += sizeof(*jumbo); + } trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, false); kv->iov_len = len; @@ -450,18 +468,22 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc } /* - * Prepare a packet for transmission. + * Prepare a (jumbo) packet for transmission. */ -static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) +static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *head, int n) { + struct rxrpc_txbuf *txb = head; rxrpc_serial_t serial; size_t len = 0; /* Each transmission of a Tx packet needs a new serial number */ - serial = rxrpc_get_next_serial(call->conn); + serial = rxrpc_get_next_serials(call->conn, n); - len += rxrpc_prepare_data_subpacket(call, txb, serial, 0); - // TODO: Loop around adding tail subpackets + for (int i = 0; i < n; i++) { + len += rxrpc_prepare_data_subpacket(call, txb, serial, i, n); + serial++; + txb = list_next_entry(txb, call_link); + } return len; } @@ -469,16 +491,24 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_tx /* * Set timeouts after transmitting a packet. */ -static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbuf *txb) +static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) { + rxrpc_serial_t serial; ktime_t now = ktime_get_real(); bool ack_requested = txb->flags & RXRPC_REQUEST_ACK; + int i; call->tx_last_sent = now; - txb->last_sent = now; + + for (i = 0; i < n; i++) { + txb->last_sent = now; + ack_requested |= txb->flags & RXRPC_REQUEST_ACK; + serial = txb->serial; + txb = list_next_entry(txb, call_link); + } if (ack_requested) { - rxrpc_begin_rtt_probe(call, txb->serial, now, rxrpc_rtt_tx_data); + rxrpc_begin_rtt_probe(call, serial, now, rxrpc_rtt_tx_data); call->peer->rtt_last_req = now; if (call->peer->rtt_count > 1) { @@ -502,7 +532,7 @@ static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbu /* * send a packet through the transport endpoint */ -static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) +static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) { struct rxrpc_connection *conn = call->conn; enum rxrpc_tx_point frag; @@ -512,7 +542,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t _enter("%x,{%d}", txb->seq, txb->pkt_len); - len = rxrpc_prepare_data_packet(call, txb); + len = rxrpc_prepare_data_packet(call, txb, n); if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { static int lose; @@ -524,7 +554,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t } } - iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, 1, len); + iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, n, len); msg.msg_name = &call->peer->srx.transport; msg.msg_namelen = call->peer->srx.transport_len; @@ -537,7 +567,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t * yet. */ if (txb->seq == call->tx_transmitted + 1) - call->tx_transmitted = txb->seq; + call->tx_transmitted = txb->seq + n - 1; /* send the packet with the don't fragment bit set if we currently * think it's small enough */ @@ -568,7 +598,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t } rxrpc_tx_backoff(call, ret); - if (ret == -EMSGSIZE && frag == rxrpc_tx_point_call_data_frag) { + if (ret == -EMSGSIZE && frag == rxrpc_tx_point_call_data_nofrag) { rxrpc_local_dont_fragment(conn->local, false); frag = rxrpc_tx_point_call_data_frag; goto retry; @@ -576,7 +606,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t done: if (ret >= 0) { - rxrpc_tstamp_data_packets(call, txb); + rxrpc_tstamp_data_packets(call, txb, n); } else { /* Cancel the call if the initial transmission fails, * particularly if that's due to network routing issues that @@ -776,13 +806,13 @@ static inline void rxrpc_instant_resend(struct rxrpc_call *call, } /* - * Transmit one packet. + * Transmit a packet, possibly gluing several subpackets together. */ -void rxrpc_transmit_one(struct rxrpc_call *call, struct rxrpc_txbuf *txb) +void rxrpc_transmit_data(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) { int ret; - ret = rxrpc_send_data_packet(call, txb); + ret = rxrpc_send_data_packet(call, txb, n); if (ret < 0) { switch (ret) { case -ENETUNREACH: diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index 755897fab626..62b09d23ec08 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -392,15 +392,28 @@ static int rxkad_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) break; case RXRPC_SECURITY_AUTH: ret = rxkad_secure_packet_auth(call, txb, req); + if (txb->alloc_size == RXRPC_JUMBO_DATALEN) + txb->jumboable = true; break; case RXRPC_SECURITY_ENCRYPT: ret = rxkad_secure_packet_encrypt(call, txb, req); + if (txb->alloc_size == RXRPC_JUMBO_DATALEN) + txb->jumboable = true; break; default: ret = -EPERM; break; } + /* Clear excess space in the packet */ + if (txb->pkt_len < txb->alloc_size) { + struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; + size_t gap = txb->alloc_size - txb->pkt_len; + void *p = whdr + 1; + + memset(p + txb->pkt_len, 0, gap); + } + skcipher_request_free(req); _leave(" = %d [set %x]", ret, y); return ret; From patchwork Wed Dec 4 07:46:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893329 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44D171990C3 for ; Wed, 4 Dec 2024 07:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298495; cv=none; b=eCUM98Zk9vs8rFUtLq8KnjfmX9d0a/m7Qq8UBF3vsc664m7PiuwdUXlVbfC6sutp2AKguFNpce38i+NF5M9mQ0/QSOvRXfxYpxtTTve6oxA+1llycpqErLvBkC73CeDRNOWbbEoqATRI8+xSiv/E5CTbkKv2QAsPHULhTntUhoQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298495; c=relaxed/simple; bh=rQ3W2F3r6HiohVE0vY/OBK/76HhFWdieLGVWaORhDMQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=axfVZG62VOEd+q7No1G3B7fPQ6tnPpHhXAXMHT3iMKA8jE45fmZnv3W80By/E78RAwTKj9gZkQgCe1hL+V0tvwu/LtjbtXxNlj8bjgMOgaq6vUmBRubSXpINGlSJ21hKdzdG7t/rhsC+tYX3tbicMo9ZYGk/t/1dlIcZ1jLjB2M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=duxg03DE; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="duxg03DE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=obq27x78dLFVc+cuRfM1jM6wMytnV14P+u4LaoDA4AU=; b=duxg03DEItCgQPmpPHhblM0bXiyEwvcjEu4NEmA83OY1Ju8jWyah8hbcnhsMD9Yv0NkHQF pjxuQGHYChD0RtDWEhHvTHu7Hs1L4mTfSc0Xngr+K99Mi3+4dj5fmJmO7KyAuVEb5AGdpv UUjYJH4bNwG7FpwkBOCECMzOXuDnWGc= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-145-BMBAKCGQPB2UVxq5r-mOOg-1; Wed, 04 Dec 2024 02:48:10 -0500 X-MC-Unique: BMBAKCGQPB2UVxq5r-mOOg-1 X-Mimecast-MFC-AGG-ID: BMBAKCGQPB2UVxq5r-mOOg Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1763E1956054; Wed, 4 Dec 2024 07:48:09 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1D4F41955D7F; Wed, 4 Dec 2024 07:48:05 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 12/39] rxrpc: Add a tracepoint to show variables pertinent to jumbo packet size Date: Wed, 4 Dec 2024 07:46:40 +0000 Message-ID: <20241204074710.990092-13-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Add a tracepoint to be called right before packets are transmitted for the first time that shows variable values that are pertinent to how many subpackets will be added to a jumbo DATA packet. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 41 ++++++++++++++++++++++++++++++++++++ net/rxrpc/call_event.c | 2 ++ 2 files changed, 43 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 9dcadad88e76..71f07e726a90 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -903,6 +903,47 @@ TRACE_EVENT(rxrpc_txqueue, __entry->tx_winsize) ); +TRACE_EVENT(rxrpc_transmit, + TP_PROTO(struct rxrpc_call *call, int space), + + TP_ARGS(call, space), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, seq) + __field(u16, space) + __field(u16, tx_winsize) + __field(u16, cong_cwnd) + __field(u16, cong_extra) + __field(u16, in_flight) + __field(u16, prepared) + __field(u16, pmtud_jumbo) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->seq = call->tx_bottom; + __entry->space = space; + __entry->tx_winsize = call->tx_winsize; + __entry->cong_cwnd = call->cong_cwnd; + __entry->cong_extra = call->cong_extra; + __entry->prepared = call->tx_prepared - call->tx_bottom; + __entry->in_flight = call->tx_top - call->acks_hard_ack; + __entry->pmtud_jumbo = call->peer->pmtud_jumbo; + ), + + TP_printk("c=%08x q=%08x sp=%u tw=%u cw=%u+%u pr=%u if=%u pj=%u", + __entry->call, + __entry->seq, + __entry->space, + __entry->tx_winsize, + __entry->cong_cwnd, + __entry->cong_extra, + __entry->prepared, + __entry->in_flight, + __entry->pmtud_jumbo) + ); + TRACE_EVENT(rxrpc_rx_data, TP_PROTO(unsigned int call, rxrpc_seq_t seq, rxrpc_serial_t serial, u8 flags), diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 3379adfaaf65..1f716f09d441 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -259,6 +259,8 @@ static void rxrpc_decant_prepared_tx(struct rxrpc_call *call) if (list_empty(&call->tx_sendmsg)) break; + trace_rxrpc_transmit(call, space); + spin_lock(&call->tx_lock); do { txb = list_first_entry(&call->tx_sendmsg, From patchwork Wed Dec 4 07:46:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893330 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA320190059 for ; Wed, 4 Dec 2024 07:48:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298502; cv=none; b=E19wJrnkMXXHJs8QQbPRPqrswdWxn58TcNvhEEreeVibxyrtUxgy/ogauEBWw9a47A8JaZy6GnrgVqdVYL6UJNyJTnGNmm0qFkvPn52rtCb//LKAILn1m7ZNHx6Sy5xe7pWh30SQH4KBx0IkDXwOKpwNjbGXeJZmQlJo4dVgo4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298502; c=relaxed/simple; bh=lk4LFmvXGTFWc0cKu6t93nzFMjkJAaLhND8AlKVZHVI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=djeJn1NgYuz+9LaGlTmV1MZjCJatw0PQYRo/Q4ASmJ41Q3g5QAHJUqhQBPY2eFHeEBkZefHSVFdWqphWhjC7JQwYfNl3h9af9haHe6w0QIPfL/oWG1JTOV0SzUP0diPnRuW/Sb1yEMD23/K60BdS4vjIc6djL9p+wC0ecT7OjgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=d6+XAFMH; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d6+XAFMH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rAvog4FllgjWuDxsbCJJipzyT/CYhN8uzeu5Y16K81A=; b=d6+XAFMHESsVIT7sjokSS7m0LjwLKfiCwiqWPrdKyd5GeEyhlSXx16ihU17YZNMdyDQ1DT OYAwlgOEQWpdOEn/LT575EROhu7dtcPpHrNPnxE6PN9YwqOvPp1RVbiYilD6eUGNby90Zh +/LFVxKy9TpZxPNRdDTflNq+Qu5ojpw= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-473-sWhC2PH1OSareb3hAqxZPg-1; Wed, 04 Dec 2024 02:48:14 -0500 X-MC-Unique: sWhC2PH1OSareb3hAqxZPg-1 X-Mimecast-MFC-AGG-ID: sWhC2PH1OSareb3hAqxZPg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2F0241955F3F; Wed, 4 Dec 2024 07:48:13 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 848D41956089; Wed, 4 Dec 2024 07:48:10 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 13/39] rxrpc: Fix CPU time starvation in I/O thread Date: Wed, 4 Dec 2024 07:46:41 +0000 Message-ID: <20241204074710.990092-14-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Starvation can happen in the rxrpc I/O thread because it goes back to the top of the I/O loop after it does any one thing without trying to give any other connection or call CPU time. Also, because it processes one call packet at a time, it tries to do the retransmission loop after each ACK without checking to see if there are other ACKs already in the queue that can update the SACK state. Fix this by: (1) Add a received-packet queue on each call. (2) Distribute packets from the master Rx queue to the individual call, conn and error queues and 'poking' calls to add them to the attend queue first thing in the I/O thread. (3) Go through all the attention-seeking connections and calls before going back to the top of the I/O thread. Each queue is extracted as a whole and then gone through so that new additions to insert themselves into the queue. (4) Make the call event handler go through all the packets currently on the call's rx_queue before transmitting and retransmitting DATA packets. (5) Drop the skb argument from the call event handler as this is now replaced with the rx_queue. Instead, keep track of whether we received a packet or an ACK for the tests that used to rely on that. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 3 + net/rxrpc/ar-internal.h | 10 +++- net/rxrpc/call_accept.c | 2 +- net/rxrpc/call_event.c | 34 +++++++----- net/rxrpc/call_object.c | 2 + net/rxrpc/conn_client.c | 12 ++-- net/rxrpc/input.c | 2 +- net/rxrpc/io_thread.c | 104 ++++++++++++++++++----------------- net/rxrpc/peer_event.c | 2 +- 9 files changed, 96 insertions(+), 75 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 71f07e726a90..28fa7be31ff8 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -120,6 +120,7 @@ EM(rxrpc_call_poke_conn_abort, "Conn-abort") \ EM(rxrpc_call_poke_error, "Error") \ EM(rxrpc_call_poke_idle, "Idle") \ + EM(rxrpc_call_poke_rx_packet, "Rx-packet") \ EM(rxrpc_call_poke_set_timeout, "Set-timo") \ EM(rxrpc_call_poke_start, "Start") \ EM(rxrpc_call_poke_timer, "Timer") \ @@ -128,6 +129,7 @@ #define rxrpc_skb_traces \ EM(rxrpc_skb_eaten_by_unshare, "ETN unshare ") \ EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \ + EM(rxrpc_skb_get_call_rx, "GET call-rx ") \ EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \ EM(rxrpc_skb_get_conn_work, "GET conn-work") \ EM(rxrpc_skb_get_last_nack, "GET last-nack") \ @@ -139,6 +141,7 @@ EM(rxrpc_skb_new_error_report, "NEW error-rpt") \ EM(rxrpc_skb_new_jumbo_subpacket, "NEW jumbo-sub") \ EM(rxrpc_skb_new_unshared, "NEW unshared ") \ + EM(rxrpc_skb_put_call_rx, "PUT call-rx ") \ EM(rxrpc_skb_put_conn_secured, "PUT conn-secd") \ EM(rxrpc_skb_put_conn_work, "PUT conn-work") \ EM(rxrpc_skb_put_error_report, "PUT error-rep") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 4386b2e6cca5..55cc68dd1b40 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -705,6 +705,7 @@ struct rxrpc_call { /* Received data tracking */ struct sk_buff_head recvmsg_queue; /* Queue of packets ready for recvmsg() */ + struct sk_buff_head rx_queue; /* Queue of packets for this call to receive */ struct sk_buff_head rx_oos_queue; /* Queue of out of sequence packets */ rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */ @@ -906,7 +907,7 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *, rxrpc_serial_t, void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *); void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb); -bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb); +bool rxrpc_input_call_event(struct rxrpc_call *call); /* * call_object.c @@ -1352,6 +1353,13 @@ static inline bool after_eq(u32 seq1, u32 seq2) return (s32)(seq1 - seq2) >= 0; } +static inline void rxrpc_queue_rx_call_packet(struct rxrpc_call *call, struct sk_buff *skb) +{ + rxrpc_get_skb(skb, rxrpc_skb_get_call_rx); + __skb_queue_tail(&call->rx_queue, skb); + rxrpc_poke_call(call, rxrpc_call_poke_rx_packet); +} + /* * debug tracing */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 0f5a1d77b890..a6776b1604ba 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -408,7 +408,7 @@ bool rxrpc_new_incoming_call(struct rxrpc_local *local, } _leave(" = %p{%d}", call, call->debug_id); - rxrpc_input_call_event(call, skb); + rxrpc_queue_rx_call_packet(call, skb); rxrpc_put_call(call, rxrpc_call_put_input); return true; diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 1f716f09d441..ef47de3f41c6 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -324,10 +324,11 @@ static void rxrpc_send_initial_ping(struct rxrpc_call *call) /* * Handle retransmission and deferred ACK/abort generation. */ -bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) +bool rxrpc_input_call_event(struct rxrpc_call *call) { + struct sk_buff *skb; ktime_t now, t; - bool resend = false; + bool resend = false, did_receive = false, saw_ack = false; s32 abort_code; rxrpc_see_call(call, rxrpc_call_see_input); @@ -337,9 +338,6 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) call->debug_id, rxrpc_call_states[__rxrpc_call_state(call)], call->events); - if (__rxrpc_call_is_complete(call)) - goto out; - /* Handle abort request locklessly, vs rxrpc_propose_abort(). */ abort_code = smp_load_acquire(&call->send_abort); if (abort_code) { @@ -348,11 +346,21 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) goto out; } - if (skb && skb->mark == RXRPC_SKB_MARK_ERROR) - goto out; + while ((skb = __skb_dequeue(&call->rx_queue))) { + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + + if (__rxrpc_call_is_complete(call) || + skb->mark == RXRPC_SKB_MARK_ERROR) { + rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); + goto out; + } + + saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK; - if (skb) rxrpc_input_call_packet(call, skb); + rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); + did_receive = true; + } /* If we see our async-event poke, check for timeout trippage. */ now = ktime_get_real(); @@ -418,12 +426,8 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_propose_ack_ping_for_keepalive); } - if (skb) { - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - - if (sp->hdr.type == RXRPC_PACKET_TYPE_ACK) - rxrpc_congestion_degrade(call); - } + if (saw_ack) + rxrpc_congestion_degrade(call); if (test_and_clear_bit(RXRPC_CALL_EV_INITIAL_PING, &call->events)) rxrpc_send_initial_ping(call); @@ -494,7 +498,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) if (call->security) call->security->free_call_crypto(call); } else { - if (skb && + if (did_receive && call->peer->ackr_adv_pmtud && call->peer->pmtud_pending) rxrpc_send_probe_for_pmtud(call); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 0df647d1d3a2..c026f16f891e 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -148,6 +148,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, INIT_LIST_HEAD(&call->attend_link); INIT_LIST_HEAD(&call->tx_sendmsg); INIT_LIST_HEAD(&call->tx_buffer); + skb_queue_head_init(&call->rx_queue); skb_queue_head_init(&call->recvmsg_queue); skb_queue_head_init(&call->rx_oos_queue); init_waitqueue_head(&call->waitq); @@ -536,6 +537,7 @@ void rxrpc_get_call(struct rxrpc_call *call, enum rxrpc_call_trace why) static void rxrpc_cleanup_ring(struct rxrpc_call *call) { rxrpc_purge_queue(&call->recvmsg_queue); + rxrpc_purge_queue(&call->rx_queue); rxrpc_purge_queue(&call->rx_oos_queue); } diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 86fb18bcd188..706631e6ac2f 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -508,16 +508,18 @@ static void rxrpc_activate_channels(struct rxrpc_bundle *bundle) void rxrpc_connect_client_calls(struct rxrpc_local *local) { struct rxrpc_call *call; + LIST_HEAD(new_client_calls); - while ((call = list_first_entry_or_null(&local->new_client_calls, - struct rxrpc_call, wait_link)) - ) { + spin_lock(&local->client_call_lock); + list_splice_tail_init(&local->new_client_calls, &new_client_calls); + spin_unlock(&local->client_call_lock); + + while ((call = list_first_entry_or_null(&new_client_calls, + struct rxrpc_call, wait_link))) { struct rxrpc_bundle *bundle = call->bundle; - spin_lock(&local->client_call_lock); list_move_tail(&call->wait_link, &bundle->waiting_calls); rxrpc_see_call(call, rxrpc_call_see_waiting_call); - spin_unlock(&local->client_call_lock); if (rxrpc_bundle_has_space(bundle)) rxrpc_activate_channels(bundle); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 8398fa10ee8d..96fe005c5e81 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1124,5 +1124,5 @@ void rxrpc_implicit_end_call(struct rxrpc_call *call, struct sk_buff *skb) break; } - rxrpc_input_call_event(call, skb); + rxrpc_input_call_event(call); } diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index bd6d4f5e97b4..bc678a299bd8 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -338,7 +338,6 @@ static int rxrpc_input_packet_on_conn(struct rxrpc_connection *conn, struct rxrpc_channel *chan; struct rxrpc_call *call = NULL; unsigned int channel; - bool ret; if (sp->hdr.securityIndex != conn->security_ix) return rxrpc_direct_abort(skb, rxrpc_eproto_wrong_security, @@ -425,9 +424,9 @@ static int rxrpc_input_packet_on_conn(struct rxrpc_connection *conn, peer_srx, skb); } - ret = rxrpc_input_call_event(call, skb); + rxrpc_queue_rx_call_packet(call, skb); rxrpc_put_call(call, rxrpc_call_put_input); - return ret; + return true; } /* @@ -444,6 +443,8 @@ int rxrpc_io_thread(void *data) ktime_t now; #endif bool should_stop; + LIST_HEAD(conn_attend_q); + LIST_HEAD(call_attend_q); complete(&local->io_thread_ready); @@ -454,43 +455,25 @@ int rxrpc_io_thread(void *data) for (;;) { rxrpc_inc_stat(local->rxnet, stat_io_loop); - /* Deal with connections that want immediate attention. */ - conn = list_first_entry_or_null(&local->conn_attend_q, - struct rxrpc_connection, - attend_link); - if (conn) { - spin_lock_bh(&local->lock); - list_del_init(&conn->attend_link); - spin_unlock_bh(&local->lock); - - rxrpc_input_conn_event(conn, NULL); - rxrpc_put_connection(conn, rxrpc_conn_put_poke); - continue; + /* Inject a delay into packets if requested. */ +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + now = ktime_get_real(); + while ((skb = skb_peek(&local->rx_delay_queue))) { + if (ktime_before(now, skb->tstamp)) + break; + skb = skb_dequeue(&local->rx_delay_queue); + skb_queue_tail(&local->rx_queue, skb); } +#endif - if (test_and_clear_bit(RXRPC_CLIENT_CONN_REAP_TIMER, - &local->client_conn_flags)) - rxrpc_discard_expired_client_conns(local); - - /* Deal with calls that want immediate attention. */ - if ((call = list_first_entry_or_null(&local->call_attend_q, - struct rxrpc_call, - attend_link))) { - spin_lock_bh(&local->lock); - list_del_init(&call->attend_link); - spin_unlock_bh(&local->lock); - - trace_rxrpc_call_poked(call); - rxrpc_input_call_event(call, NULL); - rxrpc_put_call(call, rxrpc_call_put_poke); - continue; + if (!skb_queue_empty(&local->rx_queue)) { + spin_lock_irq(&local->rx_queue.lock); + skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); + spin_unlock_irq(&local->rx_queue.lock); } - if (!list_empty(&local->new_client_calls)) - rxrpc_connect_client_calls(local); - - /* Process received packets and errors. */ - if ((skb = __skb_dequeue(&rx_queue))) { + /* Distribute packets and errors. */ + while ((skb = __skb_dequeue(&rx_queue))) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); switch (skb->mark) { case RXRPC_SKB_MARK_PACKET: @@ -514,27 +497,46 @@ int rxrpc_io_thread(void *data) rxrpc_free_skb(skb, rxrpc_skb_put_unknown); break; } - continue; } - /* Inject a delay into packets if requested. */ -#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY - now = ktime_get_real(); - while ((skb = skb_peek(&local->rx_delay_queue))) { - if (ktime_before(now, skb->tstamp)) - break; - skb = skb_dequeue(&local->rx_delay_queue); - skb_queue_tail(&local->rx_queue, skb); + /* Deal with connections that want immediate attention. */ + spin_lock_bh(&local->lock); + list_splice_tail_init(&local->conn_attend_q, &conn_attend_q); + spin_unlock_bh(&local->lock); + + while ((conn = list_first_entry_or_null(&conn_attend_q, + struct rxrpc_connection, + attend_link))) { + spin_lock_bh(&local->lock); + list_del_init(&conn->attend_link); + spin_unlock_bh(&local->lock); + rxrpc_input_conn_event(conn, NULL); + rxrpc_put_connection(conn, rxrpc_conn_put_poke); } -#endif - if (!skb_queue_empty(&local->rx_queue)) { - spin_lock_irq(&local->rx_queue.lock); - skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); - spin_unlock_irq(&local->rx_queue.lock); - continue; + if (test_and_clear_bit(RXRPC_CLIENT_CONN_REAP_TIMER, + &local->client_conn_flags)) + rxrpc_discard_expired_client_conns(local); + + /* Deal with calls that want immediate attention. */ + spin_lock_bh(&local->lock); + list_splice_tail_init(&local->call_attend_q, &call_attend_q); + spin_unlock_bh(&local->lock); + + while ((call = list_first_entry_or_null(&call_attend_q, + struct rxrpc_call, + attend_link))) { + spin_lock_bh(&local->lock); + list_del_init(&call->attend_link); + spin_unlock_bh(&local->lock); + trace_rxrpc_call_poked(call); + rxrpc_input_call_event(call); + rxrpc_put_call(call, rxrpc_call_put_poke); } + if (!list_empty(&local->new_client_calls)) + rxrpc_connect_client_calls(local); + set_current_state(TASK_INTERRUPTIBLE); should_stop = kthread_should_stop(); if (!skb_queue_empty(&local->rx_queue) || diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index 8fc9464a960c..ff30e0c05507 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -224,7 +224,7 @@ static void rxrpc_distribute_error(struct rxrpc_peer *peer, struct sk_buff *skb, rxrpc_see_call(call, rxrpc_call_see_distribute_error); rxrpc_set_call_completion(call, compl, 0, -err); - rxrpc_input_call_event(call, skb); + rxrpc_input_call_event(call); spin_lock(&peer->lock); } From patchwork Wed Dec 4 07:46:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893331 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 015AF1AAE39 for ; Wed, 4 Dec 2024 07:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298505; cv=none; b=PeIl3WGrJ9ENMs4V4zqjI0Ykt6pohcXZQuEgodXhLvwjzpm1Y+xvs1jqGnFz1G0V8a0eo1GLdYZrVsqdH6rJ2oCE04WVoOtIAjjyTUSN5OFZDupaqJgIkpy3orRYdrkM7efEQJ/jscrjLMkzcyRqzuMpQSk6KjvNHnfB/OMzHVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298505; c=relaxed/simple; bh=VMD9mWtrBmV7Zm/dSnkNbPv+VhBSXBtb83ceLg8MVpo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Nlq9xScRu2mMJTQBZh6PGmo9Z+d6+sthoiMsmifIUTuN140yOQNVDZTWUk6j3FncKrQfvYgIDVxXWKIkSAgpTQyeVRn+tBEBtEPAURiGlx5NdmoE1cITU2UloLwdh6shK6Vc/IQNzTpmvmtZe+anVAAJk+micpJFGuPsD6VxNDY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Wvl0KumB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Wvl0KumB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298502; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QtepLyz9mX56IHlh2IlnxSf+nKmI3dlpCTIXv5ukb04=; b=Wvl0KumBqLtblD6i03SwMtJCoRN06inEntl5R8jCnc86gzM+Rh/EqZl+0R86R+sp5UVEtx SQyl5YdJmfx6syjSAskWlEz6ki9QdIMFSbqJs5MC5dJ5pqGhK5sWzu9bera0bDu/Jf/voR X0xj/kEFX2/6Qlgp4wdy29OqxNPodPw= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-255-FuEtMJZUPviFoMg6bDAKRg-1; Wed, 04 Dec 2024 02:48:19 -0500 X-MC-Unique: FuEtMJZUPviFoMg6bDAKRg-1 X-Mimecast-MFC-AGG-ID: FuEtMJZUPviFoMg6bDAKRg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0F13D1955F65; Wed, 4 Dec 2024 07:48:17 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F6DC1956096; Wed, 4 Dec 2024 07:48:14 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 14/39] rxrpc: Fix injection of packet loss Date: Wed, 4 Dec 2024 07:46:42 +0000 Message-ID: <20241204074710.990092-15-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Fix the code that injects packet loss for testing to make sure call->tx_transmitted is updated. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/output.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index aededdd474d7..ca0da5e5d278 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -544,16 +544,6 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t len = rxrpc_prepare_data_packet(call, txb, n); - if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { - static int lose; - if ((lose++ & 7) == 7) { - ret = 0; - trace_rxrpc_tx_data(call, txb->seq, txb->serial, - txb->flags, true); - goto done; - } - } - iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, n, len); msg.msg_name = &call->peer->srx.transport; @@ -579,6 +569,17 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t frag = rxrpc_tx_point_call_data_nofrag; } + if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { + static int lose; + + if ((lose++ & 7) == 7) { + ret = 0; + trace_rxrpc_tx_data(call, txb->seq, txb->serial, + txb->flags, true); + goto done; + } + } + retry: /* send the packet by UDP * - returns -EMSGSIZE if UDP would have to fragment the packet From patchwork Wed Dec 4 07:46:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893332 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA35A1AB530 for ; Wed, 4 Dec 2024 07:48:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298509; cv=none; b=HuGDOo57BEmoOHCSOBkWDSpf13zb4BJ6lP+pi36DKagNSOounrlnIxaJzfbGiStNJirFNOc7WPain+V3uY85ooV2qzS0vaN2LMXVx0agPTyTrFkDU+XkUtuA0TRrCu4Iza2cZ+lPx6D1ZMEhHF7rozsnK/pn4Ub+NvOJ5IEJtQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298509; c=relaxed/simple; bh=kwWNJon5tF+ZpO4VrTE0syMyju+rtDmEaAbMdLuwYWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JJhezR5w2DDe25bOPRrqqs8aiyxMEOe7t4Yc4O3clcFC4borG9ReQWh015PhyZneHzDNwFBKz3KnsVGsPpHbydJHjv1Dxan9NbhqR43OVhdI6qHwPF0Uy9DjX3Wy5LoYmdV6rWNg9Q8zA9IzVo2mZyjfYZKd0qdz+qm1QjOQblY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NoR6jf7x; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NoR6jf7x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WTcDYAEKtlW5jKg5FtPGQRlYX8oFH03zEIRGRCv7L3E=; b=NoR6jf7xmc4iH6lVrV8TMTpHtXPyKuGW91SalZiUwZ0WYyvpJNItudBOWWSPDOUYR4Zc76 CnC/1z4rAzm6nfY+xDSLyxJP2xbQ9qhSgp4RLKIr6crqvMObzWjA+XY8/EreeTdsXgzP1A nZwsrjSDGTQ5dTiU5QwOY6JuzqoLqDY= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-292-m9lI8ngFNASOAUFoILV2cA-1; Wed, 04 Dec 2024 02:48:22 -0500 X-MC-Unique: m9lI8ngFNASOAUFoILV2cA-1 X-Mimecast-MFC-AGG-ID: m9lI8ngFNASOAUFoILV2cA Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5155B1954AF2; Wed, 4 Dec 2024 07:48:21 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7A55B1956054; Wed, 4 Dec 2024 07:48:18 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 15/39] rxrpc: Only set DF=1 on initial DATA transmission Date: Wed, 4 Dec 2024 07:46:43 +0000 Message-ID: <20241204074710.990092-16-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Change how the DF flag is managed on DATA transmissions. Set it on initial transmission and don't set it on retransmissions. Then remove the handling for EMSGSIZE in rxrpc_send_data_packet() and just pretend it didn't happen, leaving it to the retransmission path to retry. The path-MTU discovery using PING ACKs is then used to probe for the maximum DATA size - though notification by ICMP will be used if one is received. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 1 + net/rxrpc/output.c | 32 ++++++++++++++++---------------- net/rxrpc/proc.c | 5 +++-- 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 55cc68dd1b40..84efa21f176c 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -98,6 +98,7 @@ struct rxrpc_net { atomic_t stat_tx_data_send; atomic_t stat_tx_data_send_frag; atomic_t stat_tx_data_send_fail; + atomic_t stat_tx_data_send_msgsize; atomic_t stat_tx_data_underflow; atomic_t stat_tx_data_cwnd_reset; atomic_t stat_rx_data; diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index ca0da5e5d278..3d992023f80f 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -552,16 +552,11 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t msg.msg_controllen = 0; msg.msg_flags = MSG_SPLICE_PAGES; - /* Track what we've attempted to transmit at least once so that the - * retransmission algorithm doesn't try to resend what we haven't sent - * yet. + /* Send the packet with the don't fragment bit set unless we think it's + * too big or if this is a retransmission. */ - if (txb->seq == call->tx_transmitted + 1) - call->tx_transmitted = txb->seq + n - 1; - - /* send the packet with the don't fragment bit set if we currently - * think it's small enough */ - if (len >= sizeof(struct rxrpc_wire_header) + call->peer->max_data) { + if (txb->seq == call->tx_transmitted + 1 && + len >= sizeof(struct rxrpc_wire_header) + call->peer->max_data) { rxrpc_local_dont_fragment(conn->local, false); frag = rxrpc_tx_point_call_data_frag; } else { @@ -569,6 +564,13 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t frag = rxrpc_tx_point_call_data_nofrag; } + /* Track what we've attempted to transmit at least once so that the + * retransmission algorithm doesn't try to resend what we haven't sent + * yet. + */ + if (txb->seq == call->tx_transmitted + 1) + call->tx_transmitted = txb->seq + n - 1; + if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { static int lose; @@ -580,7 +582,6 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t } } -retry: /* send the packet by UDP * - returns -EMSGSIZE if UDP would have to fragment the packet * to go out of the interface @@ -591,7 +592,11 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t ret = do_udp_sendmsg(conn->local->socket, &msg, len); conn->peer->last_tx_at = ktime_get_seconds(); - if (ret < 0) { + if (ret == -EMSGSIZE) { + rxrpc_inc_stat(call->rxnet, stat_tx_data_send_msgsize); + trace_rxrpc_tx_packet(call->debug_id, call->local->kvec[0].iov_base, frag); + ret = 0; + } else if (ret < 0) { rxrpc_inc_stat(call->rxnet, stat_tx_data_send_fail); trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, frag); } else { @@ -599,11 +604,6 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t } rxrpc_tx_backoff(call, ret); - if (ret == -EMSGSIZE && frag == rxrpc_tx_point_call_data_nofrag) { - rxrpc_local_dont_fragment(conn->local, false); - frag = rxrpc_tx_point_call_data_frag; - goto retry; - } done: if (ret >= 0) { diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 44722c226064..249e1ed9c5c9 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -473,10 +473,11 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) struct rxrpc_net *rxnet = rxrpc_net(seq_file_single_net(seq)); seq_printf(seq, - "Data : send=%u sendf=%u fail=%u\n", + "Data : send=%u sendf=%u fail=%u emsz=%u\n", atomic_read(&rxnet->stat_tx_data_send), atomic_read(&rxnet->stat_tx_data_send_frag), - atomic_read(&rxnet->stat_tx_data_send_fail)); + atomic_read(&rxnet->stat_tx_data_send_fail), + atomic_read(&rxnet->stat_tx_data_send_msgsize)); seq_printf(seq, "Data-Tx : nr=%u retrans=%u uf=%u cwr=%u\n", atomic_read(&rxnet->stat_tx_data), From patchwork Wed Dec 4 07:46:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893333 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB2DC1917EE for ; Wed, 4 Dec 2024 07:48:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298515; cv=none; b=Lx5MVq6CTd2iDmEQvu9icEubhr48CfJwpH58qbkcuGIPfpknNydoFsDylJECe//mRU451bB+fuIlcG/wh9oP44YhRGdHdHGreh93txoPiKUFbwN2Phg33irN0NnnltGyNvg4Y7mwoP5aBWBxtV6sPGagqxSPbK6rNHi10UXE8Ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298515; c=relaxed/simple; bh=LMN8Ulwf59aLgyl4A1j1+zjAzdJwNO938nCa3KYT1UY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=flNnUhzHyPcwGK4Yv++0JI3keyIKYGXeDJHT3Cqt+Udta/wtUGtAbFPc+UvOrVIX27vR1PXJY3SaDRz/bad8H1x6LC67eR5PnECkfsKr0OtjeS1OK8hBc0KxTL4z2nsvJ/+xnyDh0MowlJa+UWYJrQtr8I9VApEtlT0K26SE1lY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CY5k/K4v; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CY5k/K4v" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r9x8r4zhIkxB5FDvVdvdm+yMP8eMcmsqzxa/soPgL/c=; b=CY5k/K4vN0YIz7RVJSiTSeXXIEON4VHkpV6jASgNyWaZ/Yrz0RFQ7rgXZ3Kuc4pD0WC57j 6bHF19t0e+xihl2ZOx//0/twCwKjeL1INyqXX/6Tmr/M5tNEUoYvuHk+IMTGHac9CgL0Py Z7Qbd/hTeSpC8p3HlWRGM+9cEbG+FX4= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-400-Nsi335YWNZKeu5eHReALGA-1; Wed, 04 Dec 2024 02:48:26 -0500 X-MC-Unique: Nsi335YWNZKeu5eHReALGA-1 X-Mimecast-MFC-AGG-ID: Nsi335YWNZKeu5eHReALGA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 564F81955D53; Wed, 4 Dec 2024 07:48:25 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C09B83000199; Wed, 4 Dec 2024 07:48:22 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 16/39] rxrpc: Timestamp DATA packets before transmitting them Date: Wed, 4 Dec 2024 07:46:44 +0000 Message-ID: <20241204074710.990092-17-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Move to setting the timestamp on DATA packets before transmitting them as part of the preparation. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/output.c | 56 ++++++++++++++-------------------------------- 1 file changed, 17 insertions(+), 39 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 3d992023f80f..400c3389d492 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -377,7 +377,8 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) */ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, rxrpc_serial_t serial, - int subpkt, int nr_subpkts) + int subpkt, int nr_subpkts, + ktime_t now) { struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; struct rxrpc_jumbo_header *jumbo = (void *)(whdr + 1) - sizeof(*jumbo); @@ -437,8 +438,9 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc rxrpc_inc_stat(call->rxnet, stat_why_req_ack[why]); trace_rxrpc_req_ack(call->debug_id, txb->seq, why); if (why != rxrpc_reqack_no_srv_last) { - txb->flags |= RXRPC_REQUEST_ACK; flags |= RXRPC_REQUEST_ACK; + rxrpc_begin_rtt_probe(call, serial, now, rxrpc_rtt_tx_data); + call->peer->rtt_last_req = now; } dont_set_request_ack: @@ -474,49 +476,25 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_tx { struct rxrpc_txbuf *txb = head; rxrpc_serial_t serial; + ktime_t now = ktime_get_real(); size_t len = 0; /* Each transmission of a Tx packet needs a new serial number */ serial = rxrpc_get_next_serials(call->conn, n); for (int i = 0; i < n; i++) { - len += rxrpc_prepare_data_subpacket(call, txb, serial, i, n); - serial++; - txb = list_next_entry(txb, call_link); - } - - return len; -} - -/* - * Set timeouts after transmitting a packet. - */ -static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) -{ - rxrpc_serial_t serial; - ktime_t now = ktime_get_real(); - bool ack_requested = txb->flags & RXRPC_REQUEST_ACK; - int i; - - call->tx_last_sent = now; - - for (i = 0; i < n; i++) { txb->last_sent = now; - ack_requested |= txb->flags & RXRPC_REQUEST_ACK; - serial = txb->serial; + len += rxrpc_prepare_data_subpacket(call, txb, serial, i, n, now); + serial++; txb = list_next_entry(txb, call_link); } - if (ack_requested) { - rxrpc_begin_rtt_probe(call, serial, now, rxrpc_rtt_tx_data); - - call->peer->rtt_last_req = now; - if (call->peer->rtt_count > 1) { - ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + /* Set timeouts */ + if (call->peer->rtt_count > 1) { + ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); - call->ack_lost_at = ktime_add(now, delay); - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); - } + call->ack_lost_at = ktime_add(now, delay); + trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); } if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) { @@ -527,6 +505,7 @@ static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbu } rxrpc_set_keepalive(call, now); + return len; } /* @@ -538,6 +517,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t enum rxrpc_tx_point frag; struct msghdr msg; size_t len; + bool new_call = test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags); int ret; _enter("%x,{%d}", txb->seq, txb->pkt_len); @@ -605,20 +585,18 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t rxrpc_tx_backoff(call, ret); -done: - if (ret >= 0) { - rxrpc_tstamp_data_packets(call, txb, n); - } else { + if (ret < 0) { /* Cancel the call if the initial transmission fails, * particularly if that's due to network routing issues that * aren't going away anytime soon. The layer above can arrange * the retransmission. */ - if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) + if (new_call) rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR, RX_USER_ABORT, ret); } +done: _leave(" = %d [%u]", ret, call->peer->max_data); return ret; } From patchwork Wed Dec 4 07:46:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893334 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98E821AF0A8 for ; Wed, 4 Dec 2024 07:48:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298516; cv=none; b=Vzqd8WBxqgjE+VxgZjzo8xJ1k050mKRWXLM8AcO/v0r7viseRijbalFF+6UPXqH7i6T1ZiZc9IOgkt8VsbqCbRTmhRj6AYWVLbyXOUwWDa2CCfduk0Hg0eqY4WD4wmstI7BqmEOG/fDi/zv/cVucx6cLwHST5EueE9Zz3qYsERM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298516; c=relaxed/simple; bh=f30XQa1QEyityt8HipswQs6kqEiOwtAgODah71b/6u8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VPtz5OU3zkh5Ka6cpyqB1ejzLcrFilR8rJ63dWYchWLtofMSnPYM/cmN+B9sujD2VmShHI9+D3R5vaoBrRTCcPWfOUo/hj2HZGFeZi/A7h4sp7Tp0VbMdKkoYTkYadbjmuAICXquTB67XjL+lacelotNUR8VDbJW+9qgnzM1+Ms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XksW5WUS; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XksW5WUS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298513; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iQ3WM5YS0zZWwh+zogGfyQBRH0biCe98INfc7iaAitE=; b=XksW5WUSgeS8lCpOEFCfjM1rpCOdDoAHrNdLll/efTreND2/0IIiRSITl4c08oxty9JKvp cQhtgz9pRAhDBmQy1NJIaXJ6U4Ktvw7sNY1bLnxS3nZJr0WnHVkkxVMI/vE4dU2wATXZuD 87swF+PNYHT9UOWfpS9Ogq0ysbnBTRE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-594-C0clBQFaPuCqt3J-afexbQ-1; Wed, 04 Dec 2024 02:48:30 -0500 X-MC-Unique: C0clBQFaPuCqt3J-afexbQ-1 X-Mimecast-MFC-AGG-ID: C0clBQFaPuCqt3J-afexbQ Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 316431954AE2; Wed, 4 Dec 2024 07:48:29 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A5F1A19560A3; Wed, 4 Dec 2024 07:48:26 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 17/39] rxrpc: Don't need barrier for ->tx_bottom and ->acks_hard_ack Date: Wed, 4 Dec 2024 07:46:45 +0000 Message-ID: <20241204074710.990092-18-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Patchwork-Delegate: kuba@kernel.org We don't need a barrier for the ->tx_bottom value (which indicates the lowest sequence still in the transmission queue) and the ->acks_hard_ack value (which tracks the DATA packets hard-ack'd by the latest ACK packet received and thus indicates which DATA packets can now be discarded) as the app thread doesn't use either value as a reference to memory to access. Rather, the app thread merely uses these as a guide to how much space is available in the transmission queue Change the code to use READ/WRITE_ONCE() instead. Also, change rxrpc_check_tx_space() to use the same value for tx_bottom throughout. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/sendmsg.c | 8 +++++--- net/rxrpc/txbuf.c | 4 ++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 786c1fb1369a..467c9402882e 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -94,9 +94,11 @@ static int rxrpc_wait_to_be_connected(struct rxrpc_call *call, long *timeo) */ static bool rxrpc_check_tx_space(struct rxrpc_call *call, rxrpc_seq_t *_tx_win) { + rxrpc_seq_t tx_bottom = READ_ONCE(call->tx_bottom); + if (_tx_win) - *_tx_win = call->tx_bottom; - return call->tx_prepared - call->tx_bottom < 256; + *_tx_win = tx_bottom; + return call->tx_prepared - tx_bottom < 256; } /* @@ -138,7 +140,7 @@ static int rxrpc_wait_for_tx_window_waitall(struct rxrpc_sock *rx, rtt = 2; timeout = rtt; - tx_start = smp_load_acquire(&call->acks_hard_ack); + tx_start = READ_ONCE(call->acks_hard_ack); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index 8b7c854ed3d7..0cc8f49d69a9 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -214,14 +214,14 @@ void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *call) while ((txb = list_first_entry_or_null(&call->tx_buffer, struct rxrpc_txbuf, call_link))) { - hard_ack = smp_load_acquire(&call->acks_hard_ack); + hard_ack = call->acks_hard_ack; if (before(hard_ack, txb->seq)) break; if (txb->seq != call->tx_bottom + 1) rxrpc_see_txbuf(txb, rxrpc_txbuf_see_out_of_step); ASSERTCMP(txb->seq, ==, call->tx_bottom + 1); - smp_store_release(&call->tx_bottom, call->tx_bottom + 1); + WRITE_ONCE(call->tx_bottom, call->tx_bottom + 1); list_del_rcu(&txb->call_link); trace_rxrpc_txqueue(call, rxrpc_txqueue_dequeue); From patchwork Wed Dec 4 07:46:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893335 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D7C21B0F1A for ; Wed, 4 Dec 2024 07:48:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298521; cv=none; b=BbccsO7dj+zku4F/6CxUZg87xw9s2N8nOYsGoZp0jiAbUDHSiX4B2aE4pFM6RBbdWjA7lLeYe2URb29dIH2gMMfeZ0j85gD09omUXKg3HPxi9o4R79u5nQiPdrPXTZvULiNf+BIdGSZ5WANb01fNKA34gVfB37m0aVKBxj+QBd0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298521; c=relaxed/simple; bh=xJIfW4nYgAk2o1dqUwcagpIsStB08sC7LSa5XPckERc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ijsgM/hqzee1zA93eTrHQmqLUDzX0TVoaWhRfEb2DtJNlZW6Uh8o7MnRzvdvXrWrLAYVbN6iQkekO6KxwrWCRtmrCHUBSz5gGr23ziahLmXXmNGou5fUhIE0qCtJovBdJ6SGS1mOaofCNeShR3K5Om2C0qkDH7pEt3U8vzN6YtY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IyI2E0BQ; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IyI2E0BQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M+z8Om3cgRX7uYj57J3DdDUAaBsW5k16ISue1QObnzk=; b=IyI2E0BQzwD5lvG7i2zHxMvMo8sHTkTFy9dQZc3jzkK2EbvnYAnMf6JAt2QTD3Il7QMA1e T+6VrVaC8M3+uXecCNHsMeH/An4H6mmIHHuvIAIRcvQfQfCAnRGuHX4Hmi3NkGKNuOMUzA NADvkwQG91tjYJmefxU44p9dk8aXeM4= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-130-YpRm4lmoPKy7pbIt3wO8mA-1; Wed, 04 Dec 2024 02:48:35 -0500 X-MC-Unique: YpRm4lmoPKy7pbIt3wO8mA-1 X-Mimecast-MFC-AGG-ID: YpRm4lmoPKy7pbIt3wO8mA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 84E861956054; Wed, 4 Dec 2024 07:48:33 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9DA0F3000197; Wed, 4 Dec 2024 07:48:30 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 18/39] rxrpc: Implement progressive transmission queue struct Date: Wed, 4 Dec 2024 07:46:46 +0000 Message-ID: <20241204074710.990092-19-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org We need to scan the buffers in the transmission queue occasionally when processing ACKs, but the transmission queue is currently a linked list of transmission buffers which, when we eventually expand the Tx window to 8192 packets will be very slow to walk. Instead, pull the fields we need to examine a lot (last sent time, retransmitted flag) into a new struct rxrpc_txqueue and make each one hold an array of 32 or 64 packets. The transmission queue is then a list of these structs, each pointing to a contiguous set of packets. Scanning is then a lot faster as the flags and timestamps are concentrated in the CPU dcache. The transmission timestamps are stored as a number of microseconds from a base ktime to reduce memory requirements. This should be fine provided we manage to transmit an entire buffer within an hour. This will make implementing RACK-TLP [RFC8985] easier as it will be less costly to scan the transmission buffers. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 98 ++++++++++++++--- net/rxrpc/ar-internal.h | 47 ++++++-- net/rxrpc/call_event.c | 204 ++++++++++++++++++++++------------- net/rxrpc/call_object.c | 38 ++++--- net/rxrpc/input.c | 72 ++++++++++--- net/rxrpc/output.c | 163 ++++++++++++++-------------- net/rxrpc/sendmsg.c | 69 +++++++++--- net/rxrpc/txbuf.c | 41 +------ 8 files changed, 467 insertions(+), 265 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 28fa7be31ff8..e6cf9ec940aa 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -297,7 +297,6 @@ #define rxrpc_txqueue_traces \ EM(rxrpc_txqueue_await_reply, "AWR") \ - EM(rxrpc_txqueue_dequeue, "DEQ") \ EM(rxrpc_txqueue_end, "END") \ EM(rxrpc_txqueue_queue, "QUE") \ EM(rxrpc_txqueue_queue_last, "QLS") \ @@ -482,6 +481,19 @@ EM(rxrpc_txbuf_see_send_more, "SEE SEND+ ") \ E_(rxrpc_txbuf_see_unacked, "SEE UNACKED") +#define rxrpc_tq_traces \ + EM(rxrpc_tq_alloc, "ALLOC") \ + EM(rxrpc_tq_cleaned, "CLEAN") \ + EM(rxrpc_tq_decant, "DCNT ") \ + EM(rxrpc_tq_decant_advance, "DCNT>") \ + EM(rxrpc_tq_queue, "QUEUE") \ + EM(rxrpc_tq_queue_dup, "QUE!!") \ + EM(rxrpc_tq_rotate, "ROT ") \ + EM(rxrpc_tq_rotate_and_free, "ROT-F") \ + EM(rxrpc_tq_rotate_and_keep, "ROT-K") \ + EM(rxrpc_tq_transmit, "XMIT ") \ + E_(rxrpc_tq_transmit_advance, "XMIT>") + #define rxrpc_pmtud_reduce_traces \ EM(rxrpc_pmtud_reduce_ack, "Ack ") \ EM(rxrpc_pmtud_reduce_icmp, "Icmp ") \ @@ -518,6 +530,7 @@ enum rxrpc_rtt_tx_trace { rxrpc_rtt_tx_traces } __mode(byte); enum rxrpc_sack_trace { rxrpc_sack_traces } __mode(byte); enum rxrpc_skb_trace { rxrpc_skb_traces } __mode(byte); enum rxrpc_timer_trace { rxrpc_timer_traces } __mode(byte); +enum rxrpc_tq_trace { rxrpc_tq_traces } __mode(byte); enum rxrpc_tx_point { rxrpc_tx_points } __mode(byte); enum rxrpc_txbuf_trace { rxrpc_txbuf_traces } __mode(byte); enum rxrpc_txqueue_trace { rxrpc_txqueue_traces } __mode(byte); @@ -554,6 +567,7 @@ rxrpc_rtt_tx_traces; rxrpc_sack_traces; rxrpc_skb_traces; rxrpc_timer_traces; +rxrpc_tq_traces; rxrpc_tx_points; rxrpc_txbuf_traces; rxrpc_txqueue_traces; @@ -881,7 +895,7 @@ TRACE_EVENT(rxrpc_txqueue, __field(rxrpc_seq_t, acks_hard_ack) __field(rxrpc_seq_t, tx_bottom) __field(rxrpc_seq_t, tx_top) - __field(rxrpc_seq_t, tx_prepared) + __field(rxrpc_seq_t, send_top) __field(int, tx_winsize) ), @@ -891,7 +905,7 @@ TRACE_EVENT(rxrpc_txqueue, __entry->acks_hard_ack = call->acks_hard_ack; __entry->tx_bottom = call->tx_bottom; __entry->tx_top = call->tx_top; - __entry->tx_prepared = call->tx_prepared; + __entry->send_top = call->send_top; __entry->tx_winsize = call->tx_winsize; ), @@ -902,14 +916,14 @@ TRACE_EVENT(rxrpc_txqueue, __entry->acks_hard_ack, __entry->tx_top - __entry->tx_bottom, __entry->tx_top - __entry->acks_hard_ack, - __entry->tx_prepared - __entry->tx_bottom, + __entry->send_top - __entry->tx_top, __entry->tx_winsize) ); TRACE_EVENT(rxrpc_transmit, - TP_PROTO(struct rxrpc_call *call, int space), + TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t send_top, int space), - TP_ARGS(call, space), + TP_ARGS(call, send_top, space), TP_STRUCT__entry( __field(unsigned int, call) @@ -925,12 +939,12 @@ TRACE_EVENT(rxrpc_transmit, TP_fast_assign( __entry->call = call->debug_id; - __entry->seq = call->tx_bottom; + __entry->seq = call->tx_top + 1; __entry->space = space; __entry->tx_winsize = call->tx_winsize; __entry->cong_cwnd = call->cong_cwnd; __entry->cong_extra = call->cong_extra; - __entry->prepared = call->tx_prepared - call->tx_bottom; + __entry->prepared = send_top - call->tx_bottom; __entry->in_flight = call->tx_top - call->acks_hard_ack; __entry->pmtud_jumbo = call->peer->pmtud_jumbo; ), @@ -947,6 +961,32 @@ TRACE_EVENT(rxrpc_transmit, __entry->pmtud_jumbo) ); +TRACE_EVENT(rxrpc_tx_rotate, + TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, rxrpc_seq_t to), + + TP_ARGS(call, seq, to), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, seq) + __field(rxrpc_seq_t, to) + __field(rxrpc_seq_t, top) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->seq = seq; + __entry->to = to; + __entry->top = call->tx_top; + ), + + TP_printk("c=%08x q=%08x-%08x-%08x", + __entry->call, + __entry->seq, + __entry->to, + __entry->top) + ); + TRACE_EVENT(rxrpc_rx_data, TP_PROTO(unsigned int call, rxrpc_seq_t seq, rxrpc_serial_t serial, u8 flags), @@ -1621,10 +1661,11 @@ TRACE_EVENT(rxrpc_drop_ack, ); TRACE_EVENT(rxrpc_retransmit, - TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, - rxrpc_serial_t serial, ktime_t expiry), + TP_PROTO(struct rxrpc_call *call, + struct rxrpc_send_data_req *req, + struct rxrpc_txbuf *txb, ktime_t expiry), - TP_ARGS(call, seq, serial, expiry), + TP_ARGS(call, req, txb, expiry), TP_STRUCT__entry( __field(unsigned int, call) @@ -1635,8 +1676,8 @@ TRACE_EVENT(rxrpc_retransmit, TP_fast_assign( __entry->call = call->debug_id; - __entry->seq = seq; - __entry->serial = serial; + __entry->seq = req->seq; + __entry->serial = txb->serial; __entry->expiry = expiry; ), @@ -1714,9 +1755,9 @@ TRACE_EVENT(rxrpc_reset_cwnd, __entry->cwnd = call->cong_cwnd; __entry->extra = call->cong_extra; __entry->hard_ack = call->acks_hard_ack; - __entry->prepared = call->tx_prepared - call->tx_bottom; + __entry->prepared = call->send_top - call->tx_bottom; __entry->since_last_tx = ktime_sub(now, call->tx_last_sent); - __entry->has_data = !list_empty(&call->tx_sendmsg); + __entry->has_data = call->tx_bottom != call->tx_top; ), TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu d=%u", @@ -2024,6 +2065,33 @@ TRACE_EVENT(rxrpc_txbuf, __entry->ref) ); +TRACE_EVENT(rxrpc_tq, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, + rxrpc_seq_t seq, enum rxrpc_tq_trace trace), + + TP_ARGS(call, tq, seq, trace), + + TP_STRUCT__entry( + __field(unsigned int, call_debug_id) + __field(rxrpc_seq_t, qbase) + __field(rxrpc_seq_t, seq) + __field(enum rxrpc_tq_trace, trace) + ), + + TP_fast_assign( + __entry->call_debug_id = call->debug_id; + __entry->qbase = tq ? tq->qbase : call->tx_qbase; + __entry->seq = seq; + __entry->trace = trace; + ), + + TP_printk("c=%08x bq=%08x q=%08x %s", + __entry->call_debug_id, + __entry->qbase, + __entry->seq, + __print_symbolic(__entry->trace, rxrpc_tq_traces)) + ); + TRACE_EVENT(rxrpc_poke_call, TP_PROTO(struct rxrpc_call *call, bool busy, enum rxrpc_call_poke_trace what), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 84efa21f176c..bcce4862b0b7 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -30,6 +30,7 @@ struct rxrpc_crypt { struct key_preparsed_payload; struct rxrpc_connection; struct rxrpc_txbuf; +struct rxrpc_txqueue; /* * Mark applied to socket buffers in skb->mark. skb->priority is used @@ -691,13 +692,17 @@ struct rxrpc_call { unsigned short rx_pkt_offset; /* Current recvmsg packet offset */ unsigned short rx_pkt_len; /* Current recvmsg packet len */ + /* Sendmsg data tracking. */ + rxrpc_seq_t send_top; /* Highest Tx slot filled by sendmsg. */ + struct rxrpc_txqueue *send_queue; /* Queue that sendmsg is writing into */ + /* Transmitted data tracking. */ spinlock_t tx_lock; /* Transmit queue lock */ - struct list_head tx_sendmsg; /* Sendmsg prepared packets */ - struct list_head tx_buffer; /* Buffer of transmissible packets */ + struct rxrpc_txqueue *tx_queue; /* Start of transmission buffers */ + struct rxrpc_txqueue *tx_qtail; /* End of transmission buffers */ + rxrpc_seq_t tx_qbase; /* First slot in tx_queue */ rxrpc_seq_t tx_bottom; /* First packet in buffer */ rxrpc_seq_t tx_transmitted; /* Highest packet transmitted */ - rxrpc_seq_t tx_prepared; /* Highest Tx slot prepared. */ rxrpc_seq_t tx_top; /* Highest Tx slot allocated. */ u16 tx_backoff; /* Delay to insert due to Tx failure (ms) */ u8 tx_winsize; /* Maximum size of Tx window */ @@ -815,9 +820,6 @@ struct rxrpc_send_params { * Buffer of data to be output as a packet. */ struct rxrpc_txbuf { - struct list_head call_link; /* Link in call->tx_sendmsg/tx_buffer */ - struct list_head tx_link; /* Link in live Enc queue or Tx queue */ - ktime_t last_sent; /* Time at which last transmitted */ refcount_t ref; rxrpc_seq_t seq; /* Sequence number of this packet */ rxrpc_serial_t serial; /* Last serial number transmitted with */ @@ -849,6 +851,36 @@ static inline bool rxrpc_sending_to_client(const struct rxrpc_txbuf *txb) return !rxrpc_sending_to_server(txb); } +/* + * Transmit queue element, including RACK [RFC8985] per-segment metadata. The + * transmission timestamp is in usec from the base. + */ +struct rxrpc_txqueue { + /* Start with the members we want to prefetch. */ + struct rxrpc_txqueue *next; + ktime_t xmit_ts_base; + rxrpc_seq_t qbase; + + /* The arrays we want to pack into as few cache lines as possible. */ + struct { +#define RXRPC_NR_TXQUEUE BITS_PER_LONG +#define RXRPC_TXQ_MASK (RXRPC_NR_TXQUEUE - 1) + struct rxrpc_txbuf *bufs[RXRPC_NR_TXQUEUE]; + unsigned int segment_xmit_ts[RXRPC_NR_TXQUEUE]; + } ____cacheline_aligned; +}; + +/* + * Data transmission request. + */ +struct rxrpc_send_data_req { + ktime_t now; /* Current time */ + struct rxrpc_txqueue *tq; /* Tx queue segment holding first DATA */ + rxrpc_seq_t seq; /* Sequence of first data */ + int n; /* Number of DATA packets to glue into jumbo */ + bool did_send; /* T if did actually send */ +}; + #include /* @@ -905,7 +937,6 @@ void rxrpc_propose_ping(struct rxrpc_call *call, u32 serial, enum rxrpc_propose_ack_trace why); void rxrpc_propose_delay_ACK(struct rxrpc_call *, rxrpc_serial_t, enum rxrpc_propose_ack_trace); -void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *); void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb); bool rxrpc_input_call_event(struct rxrpc_call *call); @@ -1191,10 +1222,10 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why); void rxrpc_send_probe_for_pmtud(struct rxrpc_call *call); int rxrpc_send_abort_packet(struct rxrpc_call *); +void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req); void rxrpc_send_conn_abort(struct rxrpc_connection *conn); void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb); void rxrpc_send_keepalive(struct rxrpc_peer *); -void rxrpc_transmit_data(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n); /* * peer_event.c diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index ef47de3f41c6..90e3d9395675 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -62,57 +62,85 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call) set_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags); } +/* + * Retransmit one or more packets. + */ +static void rxrpc_retransmit_data(struct rxrpc_call *call, + struct rxrpc_send_data_req *req, + ktime_t rto) +{ + struct rxrpc_txqueue *tq = req->tq; + unsigned int ix = req->seq & RXRPC_TXQ_MASK; + struct rxrpc_txbuf *txb = tq->bufs[ix]; + ktime_t xmit_ts, resend_at; + + _enter("%x,%x,%x,%x", tq->qbase, req->seq, ix, txb->debug_id); + + xmit_ts = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); + resend_at = ktime_add(xmit_ts, rto); + trace_rxrpc_retransmit(call, req, txb, + ktime_sub(resend_at, req->now)); + + txb->flags |= RXRPC_TXBUF_RESENT; + rxrpc_send_data_packet(call, req); + rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); + + req->tq = NULL; + req->n = 0; + req->did_send = true; + req->now = ktime_get_real(); +} + /* * Perform retransmission of NAK'd and unack'd packets. */ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) { + struct rxrpc_send_data_req req = { + .now = ktime_get_real(), + }; struct rxrpc_ackpacket *ack = NULL; struct rxrpc_skb_priv *sp; + struct rxrpc_txqueue *tq; struct rxrpc_txbuf *txb; - rxrpc_seq_t transmitted = call->tx_transmitted; + rxrpc_seq_t transmitted = call->tx_transmitted, seq; ktime_t next_resend = KTIME_MAX, rto = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); - ktime_t resend_at = KTIME_MAX, now, delay; + ktime_t resend_at = KTIME_MAX, delay; bool unacked = false, did_send = false; - unsigned int i; + unsigned int qix; _enter("{%d,%d}", call->acks_hard_ack, call->tx_top); - now = ktime_get_real(); - - if (list_empty(&call->tx_buffer)) + if (call->tx_bottom == call->tx_top) goto no_resend; trace_rxrpc_resend(call, ack_skb); - txb = list_first_entry(&call->tx_buffer, struct rxrpc_txbuf, call_link); + tq = call->tx_queue; + seq = call->tx_bottom; - /* Scan the soft ACK table without dropping the lock and resend any - * explicitly NAK'd packets. - */ + /* Scan the soft ACK table and resend any explicitly NAK'd packets. */ if (ack_skb) { sp = rxrpc_skb(ack_skb); ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header); - for (i = 0; i < sp->ack.nr_acks; i++) { - rxrpc_seq_t seq; + for (int i = 0; i < sp->ack.nr_acks; i++) { + rxrpc_seq_t aseq; if (ack->acks[i] & 1) continue; - seq = sp->ack.first_ack + i; - if (after(txb->seq, transmitted)) - break; - if (after(txb->seq, seq)) - continue; /* A new hard ACK probably came in */ - list_for_each_entry_from(txb, &call->tx_buffer, call_link) { - if (txb->seq == seq) - goto found_txb; - } - goto no_further_resend; + aseq = sp->ack.first_ack + i; + while (after_eq(aseq, tq->qbase + RXRPC_NR_TXQUEUE)) + tq = tq->next; + seq = aseq; + qix = seq - tq->qbase; + txb = tq->bufs[qix]; + if (after(seq, transmitted)) + goto no_further_resend; - found_txb: - resend_at = ktime_add(txb->last_sent, rto); + resend_at = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[qix]); + resend_at = ktime_add(resend_at, rto); if (after(txb->serial, call->acks_highest_serial)) { - if (ktime_after(resend_at, now) && + if (ktime_after(resend_at, req.now) && ktime_before(resend_at, next_resend)) next_resend = resend_at; continue; /* Ack point not yet reached */ @@ -120,17 +148,13 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) rxrpc_see_txbuf(txb, rxrpc_txbuf_see_unacked); - trace_rxrpc_retransmit(call, txb->seq, txb->serial, - ktime_sub(resend_at, now)); - - txb->flags |= RXRPC_TXBUF_RESENT; - rxrpc_transmit_data(call, txb, 1); - did_send = true; - now = ktime_get_real(); + req.tq = tq; + req.seq = seq; + req.n = 1; + rxrpc_retransmit_data(call, &req, rto); - if (list_is_last(&txb->call_link, &call->tx_buffer)) + if (after_eq(seq, call->tx_top)) goto no_further_resend; - txb = list_next_entry(txb, call_link); } } @@ -139,35 +163,43 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) * ACK'd or NACK'd in due course, so don't worry about it here; here we * need to consider retransmitting anything beyond that point. */ - if (after_eq(call->acks_prev_seq, call->tx_transmitted)) + seq = call->acks_prev_seq; + if (after_eq(seq, call->tx_transmitted)) goto no_further_resend; + seq++; - list_for_each_entry_from(txb, &call->tx_buffer, call_link) { - resend_at = ktime_add(txb->last_sent, rto); + while (after_eq(seq, tq->qbase + RXRPC_NR_TXQUEUE)) + tq = tq->next; - if (before_eq(txb->seq, call->acks_prev_seq)) + while (before_eq(seq, call->tx_transmitted)) { + qix = seq - tq->qbase; + if (qix >= RXRPC_NR_TXQUEUE) { + tq = tq->next; continue; - if (after(txb->seq, call->tx_transmitted)) - break; /* Not transmitted yet */ + } + txb = tq->bufs[qix]; + resend_at = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[qix]); + resend_at = ktime_add(resend_at, rto); if (ack && ack->reason == RXRPC_ACK_PING_RESPONSE && before(txb->serial, ntohl(ack->serial))) goto do_resend; /* Wasn't accounted for by a more recent ping. */ - if (ktime_after(resend_at, now)) { + if (ktime_after(resend_at, req.now)) { if (ktime_before(resend_at, next_resend)) next_resend = resend_at; + seq++; continue; } do_resend: unacked = true; - txb->flags |= RXRPC_TXBUF_RESENT; - rxrpc_transmit_data(call, txb, 1); - did_send = true; - rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); - now = ktime_get_real(); + req.tq = tq; + req.seq = seq; + req.n = 1; + rxrpc_retransmit_data(call, &req, rto); + seq++; } no_further_resend: @@ -175,7 +207,8 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) if (resend_at < KTIME_MAX) { delay = rxrpc_get_rto_backoff(call->peer, did_send); resend_at = ktime_add(resend_at, delay); - trace_rxrpc_timer_set(call, resend_at - now, rxrpc_timer_trace_resend_reset); + trace_rxrpc_timer_set(call, resend_at - req.now, + rxrpc_timer_trace_resend_reset); } call->resend_at = resend_at; @@ -186,11 +219,11 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) * that an ACK got lost somewhere. Send a ping to find out instead of * retransmitting data. */ - if (!did_send) { + if (!req.did_send) { ktime_t next_ping = ktime_add_us(call->acks_latest_ts, call->peer->srtt_us >> 3); - if (ktime_sub(next_ping, now) <= 0) + if (ktime_sub(next_ping, req.now) <= 0) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_0_retrans); } @@ -240,47 +273,68 @@ static unsigned int rxrpc_tx_window_space(struct rxrpc_call *call) } /* - * Decant some if the sendmsg prepared queue into the transmission buffer. + * Transmit some as-yet untransmitted data. */ -static void rxrpc_decant_prepared_tx(struct rxrpc_call *call) +static void rxrpc_transmit_fresh_data(struct rxrpc_call *call) { int space = rxrpc_tx_window_space(call); if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { - if (list_empty(&call->tx_sendmsg)) + if (call->send_top == call->tx_top) return; rxrpc_expose_client_call(call); } while (space > 0) { - struct rxrpc_txbuf *head = NULL, *txb; - int count = 0, limit = min(space, 1); - - if (list_empty(&call->tx_sendmsg)) + struct rxrpc_send_data_req req = { + .now = ktime_get_real(), + .seq = call->tx_transmitted + 1, + .n = 0, + }; + struct rxrpc_txqueue *tq; + struct rxrpc_txbuf *txb; + rxrpc_seq_t send_top, seq; + int limit = min(space, 1); + + /* Order send_top before the contents of the new txbufs and + * txqueue pointers + */ + send_top = smp_load_acquire(&call->send_top); + if (call->tx_top == send_top) break; - trace_rxrpc_transmit(call, space); + trace_rxrpc_transmit(call, send_top, space); + + tq = call->tx_qtail; + seq = call->tx_top; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_decant); - spin_lock(&call->tx_lock); do { - txb = list_first_entry(&call->tx_sendmsg, - struct rxrpc_txbuf, call_link); - if (!head) - head = txb; - list_move_tail(&txb->call_link, &call->tx_buffer); - count++; + int ix; + + seq++; + ix = seq & RXRPC_TXQ_MASK; + if (!ix) { + tq = tq->next; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_decant_advance); + } + if (!req.tq) + req.tq = tq; + txb = tq->bufs[ix]; + req.n++; if (!txb->jumboable) break; - } while (count < limit && !list_empty(&call->tx_sendmsg)); + } while (req.n < limit && before(seq, send_top)); - spin_unlock(&call->tx_lock); - - call->tx_top = txb->seq; - if (txb->flags & RXRPC_LAST_PACKET) + if (txb->flags & RXRPC_LAST_PACKET) { rxrpc_close_tx_phase(call); + tq = NULL; + } + call->tx_qtail = tq; + call->tx_top = seq; - space -= count; - rxrpc_transmit_data(call, head, count); + space -= req.n; + rxrpc_send_data_packet(call, &req); } } @@ -288,7 +342,7 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call) { switch (__rxrpc_call_state(call)) { case RXRPC_CALL_SERVER_ACK_REQUEST: - if (list_empty(&call->tx_sendmsg)) + if (call->tx_bottom == READ_ONCE(call->send_top)) return; rxrpc_begin_service_reply(call); fallthrough; @@ -297,11 +351,11 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call) case RXRPC_CALL_CLIENT_SEND_REQUEST: if (!rxrpc_tx_window_space(call)) return; - if (list_empty(&call->tx_sendmsg)) { + if (call->tx_bottom == READ_ONCE(call->send_top)) { rxrpc_inc_stat(call->rxnet, stat_tx_data_underflow); return; } - rxrpc_decant_prepared_tx(call); + rxrpc_transmit_fresh_data(call); break; default: return; @@ -503,8 +557,6 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) call->peer->pmtud_pending) rxrpc_send_probe_for_pmtud(call); } - if (call->acks_hard_ack != call->tx_bottom) - rxrpc_shrink_call_tx_buffer(call); _leave(""); return true; diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index c026f16f891e..a9682b31a4f9 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -146,8 +146,6 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, INIT_LIST_HEAD(&call->recvmsg_link); INIT_LIST_HEAD(&call->sock_link); INIT_LIST_HEAD(&call->attend_link); - INIT_LIST_HEAD(&call->tx_sendmsg); - INIT_LIST_HEAD(&call->tx_buffer); skb_queue_head_init(&call->rx_queue); skb_queue_head_init(&call->recvmsg_queue); skb_queue_head_init(&call->rx_oos_queue); @@ -532,9 +530,26 @@ void rxrpc_get_call(struct rxrpc_call *call, enum rxrpc_call_trace why) } /* - * Clean up the Rx skb ring. + * Clean up the transmission buffers. */ -static void rxrpc_cleanup_ring(struct rxrpc_call *call) +static void rxrpc_cleanup_tx_buffers(struct rxrpc_call *call) +{ + struct rxrpc_txqueue *tq, *next; + + for (tq = call->tx_queue; tq; tq = next) { + next = tq->next; + for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) + if (tq->bufs[i]) + rxrpc_put_txbuf(tq->bufs[i], rxrpc_txbuf_put_cleaned); + trace_rxrpc_tq(call, tq, 0, rxrpc_tq_cleaned); + kfree(tq); + } +} + +/* + * Clean up the receive buffers. + */ +static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call) { rxrpc_purge_queue(&call->recvmsg_queue); rxrpc_purge_queue(&call->rx_queue); @@ -673,23 +688,12 @@ static void rxrpc_rcu_free_call(struct rcu_head *rcu) static void rxrpc_destroy_call(struct work_struct *work) { struct rxrpc_call *call = container_of(work, struct rxrpc_call, destroyer); - struct rxrpc_txbuf *txb; del_timer_sync(&call->timer); rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); - rxrpc_cleanup_ring(call); - while ((txb = list_first_entry_or_null(&call->tx_sendmsg, - struct rxrpc_txbuf, call_link))) { - list_del(&txb->call_link); - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_cleaned); - } - while ((txb = list_first_entry_or_null(&call->tx_buffer, - struct rxrpc_txbuf, call_link))) { - list_del(&txb->call_link); - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_cleaned); - } - + rxrpc_cleanup_tx_buffers(call); + rxrpc_cleanup_rx_buffers(call); rxrpc_put_txbuf(call->tx_pending, rxrpc_txbuf_put_cleaned); rxrpc_put_connection(call->conn, rxrpc_conn_put_call); rxrpc_deactivate_bundle(call->bundle); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 96fe005c5e81..cfdd23042d4c 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -214,24 +214,71 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, struct rxrpc_ack_summary *summary) { - struct rxrpc_txbuf *txb; + struct rxrpc_txqueue *tq = call->tx_queue; + rxrpc_seq_t seq = call->tx_bottom + 1; bool rot_last = false; - list_for_each_entry_rcu(txb, &call->tx_buffer, call_link, false) { - if (before_eq(txb->seq, call->acks_hard_ack)) - continue; - if (txb->flags & RXRPC_LAST_PACKET) { + _enter("%x,%x,%x", call->tx_bottom, call->acks_hard_ack, to); + + trace_rxrpc_tx_rotate(call, seq, to); + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate); + + /* We may have a left over fully-consumed buffer at the front that we + * couldn't drop before (rotate_and_keep below). + */ + if (seq == call->tx_qbase + RXRPC_NR_TXQUEUE) { + call->tx_qbase += RXRPC_NR_TXQUEUE; + call->tx_queue = tq->next; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); + kfree(tq); + tq = call->tx_queue; + } + + do { + unsigned int ix = seq - call->tx_qbase; + + _debug("tq=%x seq=%x i=%d f=%x", tq->qbase, seq, ix, tq->bufs[ix]->flags); + if (tq->bufs[ix]->flags & RXRPC_LAST_PACKET) { set_bit(RXRPC_CALL_TX_LAST, &call->flags); rot_last = true; } - if (txb->seq == to) - break; - } + rxrpc_put_txbuf(tq->bufs[ix], rxrpc_txbuf_put_rotated); + tq->bufs[ix] = NULL; + + WRITE_ONCE(call->tx_bottom, seq); + WRITE_ONCE(call->acks_hard_ack, seq); + trace_rxrpc_txqueue(call, (rot_last ? + rxrpc_txqueue_rotate_last : + rxrpc_txqueue_rotate)); - if (rot_last) + seq++; + if (!(seq & RXRPC_TXQ_MASK)) { + prefetch(tq->next); + if (tq != call->tx_qtail) { + call->tx_qbase += RXRPC_NR_TXQUEUE; + call->tx_queue = tq->next; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); + kfree(tq); + tq = call->tx_queue; + } else { + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_keep); + tq = NULL; + break; + } + } + + } while (before_eq(seq, to)); + + if (rot_last) { set_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags); + if (tq) { + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); + kfree(tq); + call->tx_queue = NULL; + } + } - _enter("%x,%x,%x,%d", to, call->acks_hard_ack, call->tx_top, rot_last); + _debug("%x,%x,%x,%d", to, call->acks_hard_ack, call->tx_top, rot_last); if (call->acks_lowest_nak == call->acks_hard_ack) { call->acks_lowest_nak = to; @@ -240,11 +287,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, call->acks_lowest_nak = to; } - smp_store_release(&call->acks_hard_ack, to); - - trace_rxrpc_txqueue(call, (rot_last ? - rxrpc_txqueue_rotate_last : - rxrpc_txqueue_rotate)); wake_up(&call->waitq); return rot_last; } diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 400c3389d492..c2044d593237 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -375,10 +375,10 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) /* * Prepare a (sub)packet for transmission. */ -static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, - rxrpc_serial_t serial, - int subpkt, int nr_subpkts, - ktime_t now) +static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, + struct rxrpc_send_data_req *req, + struct rxrpc_txbuf *txb, + rxrpc_serial_t serial, int subpkt) { struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; struct rxrpc_jumbo_header *jumbo = (void *)(whdr + 1) - sizeof(*jumbo); @@ -386,7 +386,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc struct rxrpc_connection *conn = call->conn; struct kvec *kv = &call->local->kvec[subpkt]; size_t len = txb->pkt_len; - bool last, more; + bool last; u8 flags; _enter("%x,%zd", txb->seq, len); @@ -401,14 +401,11 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; last = txb->flags & RXRPC_LAST_PACKET; - if (subpkt < nr_subpkts - 1) { + if (subpkt < req->n - 1) { len = RXRPC_JUMBO_DATALEN; goto dont_set_request_ack; } - more = (!list_is_last(&txb->call_link, &call->tx_buffer) || - !list_empty(&call->tx_sendmsg)); - /* If our RTT cache needs working on, request an ACK. Also request * ACKs if a DATA packet appears to have been lost. * @@ -430,7 +427,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc why = rxrpc_reqack_more_rtt; else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), ktime_get_real())) why = rxrpc_reqack_old_rtt; - else if (!last && !more) + else if (!last && !after(READ_ONCE(call->send_top), txb->seq)) why = rxrpc_reqack_app_stall; else goto dont_set_request_ack; @@ -439,13 +436,13 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc trace_rxrpc_req_ack(call->debug_id, txb->seq, why); if (why != rxrpc_reqack_no_srv_last) { flags |= RXRPC_REQUEST_ACK; - rxrpc_begin_rtt_probe(call, serial, now, rxrpc_rtt_tx_data); - call->peer->rtt_last_req = now; + rxrpc_begin_rtt_probe(call, serial, req->now, rxrpc_rtt_tx_data); + call->peer->rtt_last_req = req->now; } dont_set_request_ack: /* The jumbo header overlays the wire header in the txbuf. */ - if (subpkt < nr_subpkts - 1) + if (subpkt < req->n - 1) flags |= RXRPC_JUMBO_PACKET; else flags &= ~RXRPC_JUMBO_PACKET; @@ -469,62 +466,100 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc return len; } +/* + * Prepare a transmission queue object for initial transmission. Returns the + * number of microseconds since the transmission queue base timestamp. + */ +static unsigned int rxrpc_prepare_txqueue(struct rxrpc_txqueue *tq, + struct rxrpc_send_data_req *req) +{ + if (!tq) + return 0; + if (tq->xmit_ts_base == KTIME_MIN) { + tq->xmit_ts_base = req->now; + return 0; + } + return ktime_to_us(ktime_sub(req->now, tq->xmit_ts_base)); +} + /* * Prepare a (jumbo) packet for transmission. */ -static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *head, int n) +static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req) { - struct rxrpc_txbuf *txb = head; + struct rxrpc_txqueue *tq = req->tq; rxrpc_serial_t serial; - ktime_t now = ktime_get_real(); + unsigned int xmit_ts; + rxrpc_seq_t seq = req->seq; size_t len = 0; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_transmit); + /* Each transmission of a Tx packet needs a new serial number */ - serial = rxrpc_get_next_serials(call->conn, n); + serial = rxrpc_get_next_serials(call->conn, req->n); - for (int i = 0; i < n; i++) { - txb->last_sent = now; - len += rxrpc_prepare_data_subpacket(call, txb, serial, i, n, now); - serial++; - txb = list_next_entry(txb, call_link); - } + call->tx_last_sent = req->now; + xmit_ts = rxrpc_prepare_txqueue(tq, req); + prefetch(tq->next); - /* Set timeouts */ - if (call->peer->rtt_count > 1) { - ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + for (int i = 0;;) { + int ix = seq & RXRPC_TXQ_MASK; + struct rxrpc_txbuf *txb = tq->bufs[seq & RXRPC_TXQ_MASK]; - call->ack_lost_at = ktime_add(now, delay); - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); + _debug("prep[%u] tq=%x q=%x", i, tq->qbase, seq); + tq->segment_xmit_ts[ix] = xmit_ts; + len += rxrpc_prepare_data_subpacket(call, req, txb, serial, i); + serial++; + seq++; + i++; + if (i >= req->n) + break; + if (!(seq & RXRPC_TXQ_MASK)) { + tq = tq->next; + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_transmit_advance); + xmit_ts = rxrpc_prepare_txqueue(tq, req); + } } + /* Set timeouts */ if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) { ktime_t delay = ms_to_ktime(READ_ONCE(call->next_rx_timo)); - call->expect_rx_by = ktime_add(now, delay); + call->expect_rx_by = ktime_add(req->now, delay); trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_expect_rx); } + if (call->resend_at == KTIME_MAX) { + ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + + call->resend_at = ktime_add(req->now, delay); + trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_resend); + } - rxrpc_set_keepalive(call, now); + rxrpc_set_keepalive(call, req->now); return len; } /* - * send a packet through the transport endpoint + * Send one or more packets through the transport endpoint */ -static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) +void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req) { struct rxrpc_connection *conn = call->conn; enum rxrpc_tx_point frag; + struct rxrpc_txqueue *tq = req->tq; + struct rxrpc_txbuf *txb; struct msghdr msg; + rxrpc_seq_t seq = req->seq; size_t len; bool new_call = test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags); int ret; - _enter("%x,{%d}", txb->seq, txb->pkt_len); + _enter("%x,%x-%x", tq->qbase, seq, seq + req->n - 1); - len = rxrpc_prepare_data_packet(call, txb, n); + len = rxrpc_prepare_data_packet(call, req); + txb = tq->bufs[seq & RXRPC_TXQ_MASK]; - iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, n, len); + iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, req->n, len); msg.msg_name = &call->peer->srx.transport; msg.msg_namelen = call->peer->srx.transport_len; @@ -535,7 +570,7 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t /* Send the packet with the don't fragment bit set unless we think it's * too big or if this is a retransmission. */ - if (txb->seq == call->tx_transmitted + 1 && + if (seq == call->tx_transmitted + 1 && len >= sizeof(struct rxrpc_wire_header) + call->peer->max_data) { rxrpc_local_dont_fragment(conn->local, false); frag = rxrpc_tx_point_call_data_frag; @@ -548,8 +583,8 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t * retransmission algorithm doesn't try to resend what we haven't sent * yet. */ - if (txb->seq == call->tx_transmitted + 1) - call->tx_transmitted = txb->seq + n - 1; + if (seq == call->tx_transmitted + 1) + call->tx_transmitted = seq + req->n - 1; if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { static int lose; @@ -586,19 +621,21 @@ static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t rxrpc_tx_backoff(call, ret); if (ret < 0) { - /* Cancel the call if the initial transmission fails, - * particularly if that's due to network routing issues that - * aren't going away anytime soon. The layer above can arrange - * the retransmission. + /* Cancel the call if the initial transmission fails or if we + * hit due to network routing issues that aren't going away + * anytime soon. The layer above can arrange the + * retransmission. */ - if (new_call) + if (new_call || + ret == -ENETUNREACH || + ret == -EHOSTUNREACH || + ret == -ECONNREFUSED) rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR, RX_USER_ABORT, ret); } done: _leave(" = %d [%u]", ret, call->peer->max_data); - return ret; } /* @@ -773,41 +810,3 @@ void rxrpc_send_keepalive(struct rxrpc_peer *peer) peer->last_tx_at = ktime_get_seconds(); _leave(""); } - -/* - * Schedule an instant Tx resend. - */ -static inline void rxrpc_instant_resend(struct rxrpc_call *call, - struct rxrpc_txbuf *txb) -{ - if (!__rxrpc_call_is_complete(call)) - kdebug("resend"); -} - -/* - * Transmit a packet, possibly gluing several subpackets together. - */ -void rxrpc_transmit_data(struct rxrpc_call *call, struct rxrpc_txbuf *txb, int n) -{ - int ret; - - ret = rxrpc_send_data_packet(call, txb, n); - if (ret < 0) { - switch (ret) { - case -ENETUNREACH: - case -EHOSTUNREACH: - case -ECONNREFUSED: - rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR, - 0, ret); - break; - default: - _debug("need instant resend %d", ret); - rxrpc_instant_resend(call, txb); - } - } else { - ktime_t delay = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); - - call->resend_at = ktime_add(ktime_get_real(), delay); - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_resend_tx); - } -} diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 467c9402882e..85b35b11755d 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -98,7 +98,7 @@ static bool rxrpc_check_tx_space(struct rxrpc_call *call, rxrpc_seq_t *_tx_win) if (_tx_win) *_tx_win = tx_bottom; - return call->tx_prepared - tx_bottom < 256; + return call->send_top - tx_bottom < 256; } /* @@ -242,36 +242,74 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, struct rxrpc_txbuf *txb, rxrpc_notify_end_tx_t notify_end_tx) { + struct rxrpc_txqueue *sq = call->send_queue; rxrpc_seq_t seq = txb->seq; bool poke, last = txb->flags & RXRPC_LAST_PACKET; - + int ix = seq & RXRPC_TXQ_MASK; rxrpc_inc_stat(call->rxnet, stat_tx_data); - ASSERTCMP(txb->seq, ==, call->tx_prepared + 1); - - /* We have to set the timestamp before queueing as the retransmit - * algorithm can see the packet as soon as we queue it. - */ - txb->last_sent = ktime_get_real(); + ASSERTCMP(txb->seq, ==, call->send_top + 1); if (last) trace_rxrpc_txqueue(call, rxrpc_txqueue_queue_last); else trace_rxrpc_txqueue(call, rxrpc_txqueue_queue); + if (WARN_ON_ONCE(sq->bufs[ix])) + trace_rxrpc_tq(call, sq, seq, rxrpc_tq_queue_dup); + else + trace_rxrpc_tq(call, sq, seq, rxrpc_tq_queue); + /* Add the packet to the call's output buffer */ spin_lock(&call->tx_lock); - poke = list_empty(&call->tx_sendmsg); - list_add_tail(&txb->call_link, &call->tx_sendmsg); - call->tx_prepared = seq; - if (last) + poke = (READ_ONCE(call->tx_bottom) == call->send_top); + sq->bufs[ix] = txb; + /* Order send_top after the queue->next pointer and txb content. */ + smp_store_release(&call->send_top, seq); + if (last) { rxrpc_notify_end_tx(rx, call, notify_end_tx); + call->send_queue = NULL; + } spin_unlock(&call->tx_lock); if (poke) rxrpc_poke_call(call, rxrpc_call_poke_start); } +/* + * Allocate a new txqueue unit and add it to the transmission queue. + */ +static int rxrpc_alloc_txqueue(struct sock *sk, struct rxrpc_call *call) +{ + struct rxrpc_txqueue *tq; + + tq = kzalloc(sizeof(*tq), sk->sk_allocation); + if (!tq) + return -ENOMEM; + + tq->xmit_ts_base = KTIME_MIN; + for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) + tq->segment_xmit_ts[i] = UINT_MAX; + + if (call->send_queue) { + tq->qbase = call->send_top + 1; + call->send_queue->next = tq; + call->send_queue = tq; + } else if (WARN_ON(call->tx_queue)) { + kfree(tq); + return -ENOMEM; + } else { + tq->qbase = 0; + call->tx_qbase = 0; + call->send_queue = tq; + call->tx_qtail = tq; + call->tx_queue = tq; + } + + trace_rxrpc_tq(call, tq, call->send_top, rxrpc_tq_alloc); + return 0; +} + /* * send data through a socket * - must be called in process context @@ -346,6 +384,13 @@ static int rxrpc_send_data(struct rxrpc_sock *rx, if (!rxrpc_check_tx_space(call, NULL)) goto wait_for_space; + /* See if we need to begin/extend the Tx queue. */ + if (!call->send_queue || !((call->send_top + 1) & RXRPC_TXQ_MASK)) { + ret = rxrpc_alloc_txqueue(sk, call); + if (ret < 0) + goto maybe_error; + } + /* Work out the maximum size of a packet. Assume that * the security header is going to be in the padded * region (enc blocksize), but the trailer is not. diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index 0cc8f49d69a9..067223c8c35f 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -43,17 +43,14 @@ struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_ whdr = buf + hoff; - INIT_LIST_HEAD(&txb->call_link); - INIT_LIST_HEAD(&txb->tx_link); refcount_set(&txb->ref, 1); - txb->last_sent = KTIME_MIN; txb->call_debug_id = call->debug_id; txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); txb->alloc_size = data_size; txb->space = data_size; txb->offset = sizeof(*whdr); txb->flags = call->conn->out_clientflag; - txb->seq = call->tx_prepared + 1; + txb->seq = call->send_top + 1; txb->nr_kvec = 1; txb->kvec[0].iov_base = whdr; txb->kvec[0].iov_len = sizeof(*whdr); @@ -114,8 +111,6 @@ struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_s filler = buf + sizeof(*whdr) + sizeof(*ack) + 1; trailer = buf + sizeof(*whdr) + sizeof(*ack) + 1 + 3; - INIT_LIST_HEAD(&txb->call_link); - INIT_LIST_HEAD(&txb->tx_link); refcount_set(&txb->ref, 1); txb->call_debug_id = call->debug_id; txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); @@ -200,37 +195,3 @@ void rxrpc_put_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what) rxrpc_free_txbuf(txb); } } - -/* - * Shrink the transmit buffer. - */ -void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *call) -{ - struct rxrpc_txbuf *txb; - rxrpc_seq_t hard_ack = smp_load_acquire(&call->acks_hard_ack); - bool wake = false; - - _enter("%x/%x/%x", call->tx_bottom, call->acks_hard_ack, call->tx_top); - - while ((txb = list_first_entry_or_null(&call->tx_buffer, - struct rxrpc_txbuf, call_link))) { - hard_ack = call->acks_hard_ack; - if (before(hard_ack, txb->seq)) - break; - - if (txb->seq != call->tx_bottom + 1) - rxrpc_see_txbuf(txb, rxrpc_txbuf_see_out_of_step); - ASSERTCMP(txb->seq, ==, call->tx_bottom + 1); - WRITE_ONCE(call->tx_bottom, call->tx_bottom + 1); - list_del_rcu(&txb->call_link); - - trace_rxrpc_txqueue(call, rxrpc_txqueue_dequeue); - - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_rotated); - if (after(call->acks_hard_ack, call->tx_bottom + 128)) - wake = true; - } - - if (wake) - wake_up(&call->waitq); -} From patchwork Wed Dec 4 07:46:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893336 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C79EC1B0F1A for ; Wed, 4 Dec 2024 07:48:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298524; cv=none; b=CpFSvufY3IcRLTbZqKIdF0+gZY4CAUqzJ7UFac1YW+55f8biiV54Hamp1OUPWuUnP0JVYJvbe9dV6CVyP9aYalCGiB/egs0IY/gCMK2JMMB2y9TazySJUnA5EOMrMykMgvV9u81xaCXRqta84j+aEGlplqkZ3lp3t7HDsGnpvFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298524; c=relaxed/simple; bh=Knw5PTuo01zCEQSacz+NuyAixKPfO8OuW0zbkVhfk7Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zoj6wlQmEGSbVuYmvp2MXBhDuI0aaelQeJBYEOzjM+RzqoU1dsumVXnd1bYhRyULROds+SG8OuMPtCGkMMM5qF947Z4hIeIOy0LaCW0Cs6D7FSuBugiG/ft1FdI9GhvTrbRmeqjb+e93ofWIKRu5xTQQ716yJ1qVeauPvmwtxWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Abn7MVe0; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Abn7MVe0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XBfMJA0TbcZQRACIqnLVSGKyizcs7/EIPvegkIJmBGk=; b=Abn7MVe0oNf9++q8b5tlIfmeFAwfFG/8NfbHHuUBXKQDWN9WR1VH1G1Yes8hpxIK5tS1nK pgjK/lBRJmOimrxdZ/maHzLjyyqkihAVxXD4HuwRPVE+8kK45GMcy38H7xrwTF8RC1DnXz 8u/WirSXDcJR7c+AURUzSnOk274U3U4= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-lLQKnTOUPAueiiOz5eIgkQ-1; Wed, 04 Dec 2024 02:48:38 -0500 X-MC-Unique: lLQKnTOUPAueiiOz5eIgkQ-1 X-Mimecast-MFC-AGG-ID: lLQKnTOUPAueiiOz5eIgkQ Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7BEE7195420F; Wed, 4 Dec 2024 07:48:37 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id F055419560A2; Wed, 4 Dec 2024 07:48:34 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 19/39] rxrpc: call->acks_hard_ack is now the same call->tx_bottom, so remove it Date: Wed, 4 Dec 2024 07:46:47 +0000 Message-ID: <20241204074710.990092-20-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Patchwork-Delegate: kuba@kernel.org Now that packets are removed from the Tx queue in the rotation function rather than being cleaned up later, call->acks_hard_ack now advances in step with call->tx_bottom, so remove it. Some of the places call->acks_hard_ack is used in the rxrpc tracepoints are replaced by call->acks_first_seq instead as that's the peer's reported idea of the hard-ACK point. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 20 ++++++++++---------- net/rxrpc/ar-internal.h | 1 - net/rxrpc/call_event.c | 4 ++-- net/rxrpc/input.c | 17 ++++++++--------- net/rxrpc/proc.c | 6 +++--- net/rxrpc/sendmsg.c | 6 +++--- 6 files changed, 26 insertions(+), 28 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index e6cf9ec940aa..0f253287de00 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -892,8 +892,8 @@ TRACE_EVENT(rxrpc_txqueue, TP_STRUCT__entry( __field(unsigned int, call) __field(enum rxrpc_txqueue_trace, why) - __field(rxrpc_seq_t, acks_hard_ack) __field(rxrpc_seq_t, tx_bottom) + __field(rxrpc_seq_t, acks_first_seq) __field(rxrpc_seq_t, tx_top) __field(rxrpc_seq_t, send_top) __field(int, tx_winsize) @@ -902,8 +902,8 @@ TRACE_EVENT(rxrpc_txqueue, TP_fast_assign( __entry->call = call->debug_id; __entry->why = why; - __entry->acks_hard_ack = call->acks_hard_ack; __entry->tx_bottom = call->tx_bottom; + __entry->acks_first_seq = call->acks_first_seq; __entry->tx_top = call->tx_top; __entry->send_top = call->send_top; __entry->tx_winsize = call->tx_winsize; @@ -913,9 +913,9 @@ TRACE_EVENT(rxrpc_txqueue, __entry->call, __print_symbolic(__entry->why, rxrpc_txqueue_traces), __entry->tx_bottom, - __entry->acks_hard_ack, - __entry->tx_top - __entry->tx_bottom, - __entry->tx_top - __entry->acks_hard_ack, + __entry->acks_first_seq, + __entry->acks_first_seq - __entry->tx_bottom, + __entry->tx_top - __entry->acks_first_seq, __entry->send_top - __entry->tx_top, __entry->tx_winsize) ); @@ -945,7 +945,7 @@ TRACE_EVENT(rxrpc_transmit, __entry->cong_cwnd = call->cong_cwnd; __entry->cong_extra = call->cong_extra; __entry->prepared = send_top - call->tx_bottom; - __entry->in_flight = call->tx_top - call->acks_hard_ack; + __entry->in_flight = call->tx_top - call->tx_bottom; __entry->pmtud_jumbo = call->peer->pmtud_jumbo; ), @@ -1707,7 +1707,7 @@ TRACE_EVENT(rxrpc_congest, TP_fast_assign( __entry->call = call->debug_id; __entry->change = change; - __entry->hard_ack = call->acks_hard_ack; + __entry->hard_ack = call->acks_first_seq; __entry->top = call->tx_top; __entry->lowest_nak = call->acks_lowest_nak; __entry->ack_serial = ack_serial; @@ -1754,7 +1754,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, __entry->mode = call->cong_mode; __entry->cwnd = call->cong_cwnd; __entry->extra = call->cong_extra; - __entry->hard_ack = call->acks_hard_ack; + __entry->hard_ack = call->acks_first_seq; __entry->prepared = call->send_top - call->tx_bottom; __entry->since_last_tx = ktime_sub(now, call->tx_last_sent); __entry->has_data = call->tx_bottom != call->tx_top; @@ -1855,7 +1855,7 @@ TRACE_EVENT(rxrpc_resend, TP_fast_assign( struct rxrpc_skb_priv *sp = ack ? rxrpc_skb(ack) : NULL; __entry->call = call->debug_id; - __entry->seq = call->acks_hard_ack; + __entry->seq = call->acks_first_seq; __entry->transmitted = call->tx_transmitted; __entry->ack_serial = sp ? sp->hdr.serial : 0; ), @@ -1944,7 +1944,7 @@ TRACE_EVENT(rxrpc_call_reset, __entry->call_id = call->call_id; __entry->call_serial = call->rx_serial; __entry->conn_serial = call->conn->hi_serial; - __entry->tx_seq = call->acks_hard_ack; + __entry->tx_seq = call->acks_first_seq; __entry->rx_seq = call->rx_highest_seq; ), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index bcce4862b0b7..6683043cee3f 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -759,7 +759,6 @@ struct rxrpc_call { ktime_t acks_latest_ts; /* Timestamp of latest ACK received */ rxrpc_seq_t acks_first_seq; /* first sequence number received */ rxrpc_seq_t acks_prev_seq; /* Highest previousPacket received */ - rxrpc_seq_t acks_hard_ack; /* Latest hard-ack point */ rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer (or ==tx_hard_ack) */ rxrpc_serial_t acks_highest_serial; /* Highest serial number ACK'd */ }; diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 90e3d9395675..2311e5c737e8 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -109,7 +109,7 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) bool unacked = false, did_send = false; unsigned int qix; - _enter("{%d,%d}", call->acks_hard_ack, call->tx_top); + _enter("{%d,%d}", call->tx_bottom, call->tx_top); if (call->tx_bottom == call->tx_top) goto no_resend; @@ -267,7 +267,7 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call) static unsigned int rxrpc_tx_window_space(struct rxrpc_call *call) { int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); - int in_flight = call->tx_top - call->acks_hard_ack; + int in_flight = call->tx_top - call->tx_bottom; return max(winsize - in_flight, 0); } diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index cfdd23042d4c..afb87a3322da 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -40,7 +40,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, bool resend = false; summary->flight_size = - (call->tx_top - call->acks_hard_ack) - summary->nr_acks; + (call->tx_top - call->tx_bottom) - summary->nr_acks; if (test_and_clear_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags)) { summary->retrans_timeo = true; @@ -175,7 +175,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, * state. */ if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) || - summary->nr_acks != call->tx_top - call->acks_hard_ack) { + summary->nr_acks != call->tx_top - call->tx_bottom) { call->cong_extra++; wake_up(&call->waitq); } @@ -218,7 +218,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, rxrpc_seq_t seq = call->tx_bottom + 1; bool rot_last = false; - _enter("%x,%x,%x", call->tx_bottom, call->acks_hard_ack, to); + _enter("%x,%x", call->tx_bottom, to); trace_rxrpc_tx_rotate(call, seq, to); trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate); @@ -246,7 +246,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, tq->bufs[ix] = NULL; WRITE_ONCE(call->tx_bottom, seq); - WRITE_ONCE(call->acks_hard_ack, seq); trace_rxrpc_txqueue(call, (rot_last ? rxrpc_txqueue_rotate_last : rxrpc_txqueue_rotate)); @@ -278,9 +277,9 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, } } - _debug("%x,%x,%x,%d", to, call->acks_hard_ack, call->tx_top, rot_last); + _debug("%x,%x,%x,%d", to, call->tx_bottom, call->tx_top, rot_last); - if (call->acks_lowest_nak == call->acks_hard_ack) { + if (call->acks_lowest_nak == call->tx_bottom) { call->acks_lowest_nak = to; } else if (after(to, call->acks_lowest_nak)) { summary->new_low_nack = true; @@ -968,7 +967,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (unlikely(summary.ack_reason == RXRPC_ACK_OUT_OF_SEQUENCE) && first_soft_ack == 1 && prev_pkt == 0 && - call->acks_hard_ack == 0 && + call->tx_bottom == 0 && rxrpc_is_client_call(call)) { rxrpc_set_call_completion(call, RXRPC_CALL_REMOTELY_ABORTED, 0, -ENETRESET); @@ -1033,13 +1032,13 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) goto send_response; } - if (before(hard_ack, call->acks_hard_ack) || + if (before(hard_ack, call->tx_bottom) || after(hard_ack, call->tx_top)) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_outside_window); if (nr_acks > call->tx_top - hard_ack) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_sack_overflow); - if (after(hard_ack, call->acks_hard_ack)) { + if (after(hard_ack, call->tx_bottom)) { if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ack); goto send_response; diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 249e1ed9c5c9..a8325b8e33c2 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -52,7 +52,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) struct rxrpc_call *call; struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); enum rxrpc_call_state state; - rxrpc_seq_t acks_hard_ack; + rxrpc_seq_t tx_bottom; char lbuff[50], rbuff[50]; long timeout = 0; @@ -79,7 +79,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) if (state != RXRPC_CALL_SERVER_PREALLOC) timeout = ktime_ms_delta(READ_ONCE(call->expect_rx_by), ktime_get_real()); - acks_hard_ack = READ_ONCE(call->acks_hard_ack); + tx_bottom = READ_ONCE(call->tx_bottom); seq_printf(seq, "UDP %-47.47s %-47.47s %4x %08x %08x %s %3u" " %-8.8s %08x %08x %08x %02x %08x %02x %08x %02x %06lx\n", @@ -93,7 +93,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) rxrpc_call_states[state], call->abort_code, call->debug_id, - acks_hard_ack, READ_ONCE(call->tx_top) - acks_hard_ack, + tx_bottom, READ_ONCE(call->tx_top) - tx_bottom, call->ackr_window, call->ackr_wtop - call->ackr_window, call->rx_serial, call->cong_cwnd, diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 85b35b11755d..dfbf9f4b24b6 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -140,7 +140,7 @@ static int rxrpc_wait_for_tx_window_waitall(struct rxrpc_sock *rx, rtt = 2; timeout = rtt; - tx_start = READ_ONCE(call->acks_hard_ack); + tx_start = READ_ONCE(call->tx_bottom); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); @@ -197,8 +197,8 @@ static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx, DECLARE_WAITQUEUE(myself, current); int ret; - _enter(",{%u,%u,%u,%u}", - call->tx_bottom, call->acks_hard_ack, call->tx_top, call->tx_winsize); + _enter(",{%u,%u,%u}", + call->tx_bottom, call->tx_top, call->tx_winsize); add_wait_queue(&call->waitq, &myself); From patchwork Wed Dec 4 07:46:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893337 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19FE11B0F3F for ; Wed, 4 Dec 2024 07:48:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298527; cv=none; b=INcG3h9uzZQZBCuIP2Vg/O4SCBWnyDCWrCPHgcU4bh6AhLpSQSF6OZCltoIf1zy1IzCCc7xHfsB8QlkN+aYNw/X7mKDaeibw69Ctv5Nxb4e98UoGOrfPwbiBlfvmBbhH3P5MGjDuoybFzmPYaHbACm63YoFDDXH6DL18K6ORu14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298527; c=relaxed/simple; bh=lIm1sfkgQdgswnNETY486ch/tTo3X6nN2DljFQj+Ohk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rlP97n01sVDynkDZYoWY6zsz0o9jKWwTR2Koz1W1WEWfD6HMn+o6TfDw+1O1/F00CIyZTup3+GPWL1CY69m/XUPehV2c7X2X/nC9lSrnXDyklb/dB6fZlBR6miu6b3yoExaZ7eEYwKEKNM2Gk636wvOpk9WaMOb9M3vCowjqIAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fFmMzeKx; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fFmMzeKx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298524; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/7QeiBm2vEJMz1lK39zYjm8kRRwNv4rjIa8Oiam3+ww=; b=fFmMzeKxUpMFk5aW20lP1GNd+UKNz6H0OC8tPmQfHN8f+MEwYBbZb0m3oWr+Xj7qap2dLA dfZAQBXt1nMYM02PHb1DGFz1HFCw3K8ZLL3n7qTGEfScqJT0ttjqCzuxEss5TrWQMnzj00 Skvx4Y+F0Th1RYW9AQfLKeG3aeTJ268= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-279-pfcyFQooMdiMR1FUr90ZyQ-1; Wed, 04 Dec 2024 02:48:42 -0500 X-MC-Unique: pfcyFQooMdiMR1FUr90ZyQ-1 X-Mimecast-MFC-AGG-ID: pfcyFQooMdiMR1FUr90ZyQ Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 75E6B1956088; Wed, 4 Dec 2024 07:48:41 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E9DAD19560A2; Wed, 4 Dec 2024 07:48:38 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 20/39] rxrpc: Replace call->acks_first_seq with tracking of the hard ACK point Date: Wed, 4 Dec 2024 07:46:48 +0000 Message-ID: <20241204074710.990092-21-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Patchwork-Delegate: kuba@kernel.org Replace the call->acks_first_seq variable (which holds ack.firstPacket from the latest ACK packet and indicates the sequence number of the first ack slot in the SACK table) with call->acks_hard_ack which will hold the highest sequence hard ACK'd. This is 1 less than call->acks_first_seq, but it fits in the same schema as the other tracking variables which hold the sequence of a packet, not one past it. This will fix the rxrpc_congest tracepoint's calculation of SACK window size which shows one fewer than it should - and will occasionally go to -1. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 68 +++++++++++++++++------------------- net/rxrpc/ar-internal.h | 2 +- net/rxrpc/input.c | 56 ++++++++++++++--------------- 3 files changed, 59 insertions(+), 67 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 0f253287de00..91108e0de3af 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -893,7 +893,7 @@ TRACE_EVENT(rxrpc_txqueue, __field(unsigned int, call) __field(enum rxrpc_txqueue_trace, why) __field(rxrpc_seq_t, tx_bottom) - __field(rxrpc_seq_t, acks_first_seq) + __field(rxrpc_seq_t, acks_hard_ack) __field(rxrpc_seq_t, tx_top) __field(rxrpc_seq_t, send_top) __field(int, tx_winsize) @@ -903,19 +903,19 @@ TRACE_EVENT(rxrpc_txqueue, __entry->call = call->debug_id; __entry->why = why; __entry->tx_bottom = call->tx_bottom; - __entry->acks_first_seq = call->acks_first_seq; + __entry->acks_hard_ack = call->acks_hard_ack; __entry->tx_top = call->tx_top; __entry->send_top = call->send_top; __entry->tx_winsize = call->tx_winsize; ), - TP_printk("c=%08x %s f=%08x h=%08x n=%u/%u/%u/%u", + TP_printk("c=%08x %s b=%08x h=%08x n=%u/%u/%u/%u", __entry->call, __print_symbolic(__entry->why, rxrpc_txqueue_traces), __entry->tx_bottom, - __entry->acks_first_seq, - __entry->acks_first_seq - __entry->tx_bottom, - __entry->tx_top - __entry->acks_first_seq, + __entry->acks_hard_ack, + __entry->acks_hard_ack - __entry->tx_bottom, + __entry->tx_top - __entry->acks_hard_ack, __entry->send_top - __entry->tx_top, __entry->tx_winsize) ); @@ -1015,11 +1015,9 @@ TRACE_EVENT(rxrpc_rx_data, ); TRACE_EVENT(rxrpc_rx_ack, - TP_PROTO(struct rxrpc_call *call, - rxrpc_serial_t serial, rxrpc_serial_t ack_serial, - rxrpc_seq_t first, rxrpc_seq_t prev, u8 reason, u8 n_acks), + TP_PROTO(struct rxrpc_call *call, struct rxrpc_skb_priv *sp), - TP_ARGS(call, serial, ack_serial, first, prev, reason, n_acks), + TP_ARGS(call, sp), TP_STRUCT__entry( __field(unsigned int, call) @@ -1032,13 +1030,13 @@ TRACE_EVENT(rxrpc_rx_ack, ), TP_fast_assign( - __entry->call = call->debug_id; - __entry->serial = serial; - __entry->ack_serial = ack_serial; - __entry->first = first; - __entry->prev = prev; - __entry->reason = reason; - __entry->n_acks = n_acks; + __entry->call = call->debug_id; + __entry->serial = sp->hdr.serial; + __entry->ack_serial = sp->ack.acked_serial; + __entry->first = sp->ack.first_ack; + __entry->prev = sp->ack.prev_ack; + __entry->reason = sp->ack.reason; + __entry->n_acks = sp->ack.nr_acks; ), TP_printk("c=%08x %08x %s r=%08x f=%08x p=%08x n=%u", @@ -1707,7 +1705,7 @@ TRACE_EVENT(rxrpc_congest, TP_fast_assign( __entry->call = call->debug_id; __entry->change = change; - __entry->hard_ack = call->acks_first_seq; + __entry->hard_ack = call->acks_hard_ack; __entry->top = call->tx_top; __entry->lowest_nak = call->acks_lowest_nak; __entry->ack_serial = ack_serial; @@ -1754,7 +1752,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, __entry->mode = call->cong_mode; __entry->cwnd = call->cong_cwnd; __entry->extra = call->cong_extra; - __entry->hard_ack = call->acks_first_seq; + __entry->hard_ack = call->acks_hard_ack; __entry->prepared = call->send_top - call->tx_bottom; __entry->since_last_tx = ktime_sub(now, call->tx_last_sent); __entry->has_data = call->tx_bottom != call->tx_top; @@ -1855,7 +1853,7 @@ TRACE_EVENT(rxrpc_resend, TP_fast_assign( struct rxrpc_skb_priv *sp = ack ? rxrpc_skb(ack) : NULL; __entry->call = call->debug_id; - __entry->seq = call->acks_first_seq; + __entry->seq = call->acks_hard_ack; __entry->transmitted = call->tx_transmitted; __entry->ack_serial = sp ? sp->hdr.serial : 0; ), @@ -1944,7 +1942,7 @@ TRACE_EVENT(rxrpc_call_reset, __entry->call_id = call->call_id; __entry->call_serial = call->rx_serial; __entry->conn_serial = call->conn->hi_serial; - __entry->tx_seq = call->acks_first_seq; + __entry->tx_seq = call->acks_hard_ack; __entry->rx_seq = call->rx_highest_seq; ), @@ -1976,38 +1974,36 @@ TRACE_EVENT(rxrpc_notify_socket, ); TRACE_EVENT(rxrpc_rx_discard_ack, - TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, - rxrpc_seq_t first_soft_ack, rxrpc_seq_t call_ackr_first, - rxrpc_seq_t prev_pkt, rxrpc_seq_t call_ackr_prev), + TP_PROTO(struct rxrpc_call *call, rxrpc_serial_t serial, + rxrpc_seq_t hard_ack, rxrpc_seq_t prev_pkt), - TP_ARGS(debug_id, serial, first_soft_ack, call_ackr_first, - prev_pkt, call_ackr_prev), + TP_ARGS(call, serial, hard_ack, prev_pkt), TP_STRUCT__entry( __field(unsigned int, debug_id) __field(rxrpc_serial_t, serial) - __field(rxrpc_seq_t, first_soft_ack) - __field(rxrpc_seq_t, call_ackr_first) + __field(rxrpc_seq_t, hard_ack) __field(rxrpc_seq_t, prev_pkt) - __field(rxrpc_seq_t, call_ackr_prev) + __field(rxrpc_seq_t, acks_hard_ack) + __field(rxrpc_seq_t, acks_prev_seq) ), TP_fast_assign( - __entry->debug_id = debug_id; + __entry->debug_id = call->debug_id; __entry->serial = serial; - __entry->first_soft_ack = first_soft_ack; - __entry->call_ackr_first = call_ackr_first; + __entry->hard_ack = hard_ack; __entry->prev_pkt = prev_pkt; - __entry->call_ackr_prev = call_ackr_prev; + __entry->acks_hard_ack = call->acks_hard_ack; + __entry->acks_prev_seq = call->acks_prev_seq; ), TP_printk("c=%08x r=%08x %08x<%08x %08x<%08x", __entry->debug_id, __entry->serial, - __entry->first_soft_ack, - __entry->call_ackr_first, + __entry->hard_ack, + __entry->acks_hard_ack, __entry->prev_pkt, - __entry->call_ackr_prev) + __entry->acks_prev_seq) ); TRACE_EVENT(rxrpc_req_ack, diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 6683043cee3f..3e57cef7385f 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -757,7 +757,7 @@ struct rxrpc_call { /* Transmission-phase ACK management (ACKs we've received). */ ktime_t acks_latest_ts; /* Timestamp of latest ACK received */ - rxrpc_seq_t acks_first_seq; /* first sequence number received */ + rxrpc_seq_t acks_hard_ack; /* Highest sequence hard acked */ rxrpc_seq_t acks_prev_seq; /* Highest previousPacket received */ rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer (or ==tx_hard_ack) */ rxrpc_serial_t acks_highest_serial; /* Highest serial number ACK'd */ diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index afb87a3322da..b89fd0dee324 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -782,12 +782,12 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb */ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, - rxrpc_seq_t seq) + rxrpc_seq_t hard_ack) { struct sk_buff *skb = call->cong_last_nack; struct rxrpc_skb_priv *sp = rxrpc_skb(skb); unsigned int i, new_acks = 0, retained_nacks = 0; - rxrpc_seq_t old_seq = sp->ack.first_ack; + rxrpc_seq_t seq = hard_ack + 1, old_seq = sp->ack.first_ack; u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); if (after_eq(seq, old_seq + sp->ack.nr_acks)) { @@ -810,7 +810,7 @@ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, summary->nr_retained_nacks = retained_nacks; } - return old_seq + sp->ack.nr_acks; + return old_seq + sp->ack.nr_acks - 1; } /* @@ -825,22 +825,23 @@ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, static void rxrpc_input_soft_acks(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, struct sk_buff *skb, - rxrpc_seq_t seq, rxrpc_seq_t since) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); unsigned int i, old_nacks = 0; - rxrpc_seq_t lowest_nak = seq + sp->ack.nr_acks; + rxrpc_seq_t lowest_nak = call->acks_hard_ack + sp->ack.nr_acks + 1; + rxrpc_seq_t seq = call->acks_hard_ack; u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); for (i = 0; i < sp->ack.nr_acks; i++) { + seq++; if (acks[i] == RXRPC_ACK_TYPE_ACK) { summary->nr_acks++; - if (after_eq(seq, since)) + if (after(seq, since)) summary->nr_new_acks++; } else { summary->saw_nacks = true; - if (before(seq, since)) { + if (before_eq(seq, since)) { /* Overlap with previous ACK */ old_nacks++; } else { @@ -851,7 +852,6 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, if (before(seq, lowest_nak)) lowest_nak = seq; } - seq++; } if (lowest_nak != call->acks_lowest_nak) { @@ -874,21 +874,21 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, * with respect to the ack state conveyed by preceding ACKs. */ static bool rxrpc_is_ack_valid(struct rxrpc_call *call, - rxrpc_seq_t first_pkt, rxrpc_seq_t prev_pkt) + rxrpc_seq_t hard_ack, rxrpc_seq_t prev_pkt) { - rxrpc_seq_t base = READ_ONCE(call->acks_first_seq); + rxrpc_seq_t base = READ_ONCE(call->acks_hard_ack); - if (after(first_pkt, base)) + if (after(hard_ack, base)) return true; /* The window advanced */ - if (before(first_pkt, base)) + if (before(hard_ack, base)) return false; /* firstPacket regressed */ if (after_eq(prev_pkt, call->acks_prev_seq)) return true; /* previousPacket hasn't regressed. */ /* Some rx implementations put a serial number in previousPacket. */ - if (after_eq(prev_pkt, base + call->tx_winsize)) + if (after(prev_pkt, base + call->tx_winsize)) return false; return true; } @@ -906,8 +906,8 @@ static bool rxrpc_is_ack_valid(struct rxrpc_call *call, static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) { struct rxrpc_ack_summary summary = { 0 }; - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct rxrpc_acktrailer trailer; + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); rxrpc_serial_t ack_serial, acked_serial; rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since; int nr_acks, offset, ioffset; @@ -925,9 +925,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? sp->ack.reason : RXRPC_ACK__INVALID); - trace_rxrpc_rx_ack(call, ack_serial, acked_serial, - first_soft_ack, prev_pkt, - summary.ack_reason, nr_acks); + trace_rxrpc_rx_ack(call, sp); rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); if (acked_serial != 0) { @@ -952,7 +950,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) * lost the call because it switched to a different peer. */ if (unlikely(summary.ack_reason == RXRPC_ACK_EXCEEDS_WINDOW) && - first_soft_ack == 1 && + hard_ack == 0 && prev_pkt == 0 && rxrpc_is_client_call(call)) { rxrpc_set_call_completion(call, RXRPC_CALL_REMOTELY_ABORTED, @@ -965,7 +963,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) * if we still have it buffered to the beginning. */ if (unlikely(summary.ack_reason == RXRPC_ACK_OUT_OF_SEQUENCE) && - first_soft_ack == 1 && + hard_ack == 0 && prev_pkt == 0 && call->tx_bottom == 0 && rxrpc_is_client_call(call)) { @@ -975,10 +973,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) } /* Discard any out-of-order or duplicate ACKs (outside lock). */ - if (!rxrpc_is_ack_valid(call, first_soft_ack, prev_pkt)) { - trace_rxrpc_rx_discard_ack(call->debug_id, ack_serial, - first_soft_ack, call->acks_first_seq, - prev_pkt, call->acks_prev_seq); + if (!rxrpc_is_ack_valid(call, hard_ack, prev_pkt)) { + trace_rxrpc_rx_discard_ack(call, ack_serial, hard_ack, prev_pkt); goto send_response; } @@ -992,17 +988,17 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) skb_condense(skb); if (call->cong_last_nack) { - since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack); + since = rxrpc_input_check_prev_ack(call, &summary, hard_ack); rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); call->cong_last_nack = NULL; } else { - summary.nr_new_acks = first_soft_ack - call->acks_first_seq; - call->acks_lowest_nak = first_soft_ack + nr_acks; - since = first_soft_ack; + summary.nr_new_acks = hard_ack - call->acks_hard_ack; + call->acks_lowest_nak = hard_ack + nr_acks; + since = hard_ack; } call->acks_latest_ts = skb->tstamp; - call->acks_first_seq = first_soft_ack; + call->acks_hard_ack = hard_ack; call->acks_prev_seq = prev_pkt; switch (summary.ack_reason) { @@ -1018,7 +1014,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (trailer.maxMTU) rxrpc_input_ack_trailer(call, skb, &trailer); - if (first_soft_ack == 0) + if (hard_ack + 1 == 0) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); /* Ignore ACKs unless we are or have just been transmitting. */ @@ -1048,7 +1044,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) { if (offset > (int)skb->len - nr_acks) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack); - rxrpc_input_soft_acks(call, &summary, skb, first_soft_ack, since); + rxrpc_input_soft_acks(call, &summary, skb, since); rxrpc_get_skb(skb, rxrpc_skb_get_last_nack); call->cong_last_nack = skb; } From patchwork Wed Dec 4 07:46:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893338 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DEE81B3926 for ; Wed, 4 Dec 2024 07:48:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298533; cv=none; b=cZ12OXEHRGJkTxhweOSrS7pYEJ84umksnAHTteN4MVONSPljdjlzemjEh9qaLYHzw3PjbuUkjirxZoogbDNghG/KXD5vtXloePCj4jwa8BNHIwrbW6iz1UMliIJmMe6xlU0JvHw8U2JJu/nglarzV9xOqWeWJoUGQTg6lN9dxc4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298533; c=relaxed/simple; bh=bg+ZoZAYQnzllVqK846Shozbwj6t+n/u5356UyLiuu0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=coELBbmlDtqBAB1SVKCsxo3fRYDI3YRxDhMmOsyTHqi51O55nrpXH0s2LOI+0omHizB4Yz1vJx9RMJ8gBPgkLHMciNozrcsJSnqd90LrY4kZ0eXZ1I+53SiZX16UqzXa2QUq7PE57rGztVpkNEdjrv9y0CjWRKxi15Wj8eWw3Oc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iiyaZYGM; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iiyaZYGM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yms2zrhSiXiiiUZtdHkRN+r4vsd4Em/T3BloXlOTbnY=; b=iiyaZYGMdvE1nsDisvlEOXMRQIN4Aj3cIZdcUVOc3gzKkelt4F7OO+fUkEHIHqX9WvYU17 F0nMofRznssM0M+UJz+oF8JNj1KC0CluomDcLCMFmGi3PmwnbqfZLBqJXjGqNhPtxUfaak aRlKMLc+C6U470YvXlu0OSGmJzyoVg8= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-117-0nf7goADPVSzpYMIr35Z7Q-1; Wed, 04 Dec 2024 02:48:46 -0500 X-MC-Unique: 0nf7goADPVSzpYMIr35Z7Q-1 X-Mimecast-MFC-AGG-ID: 0nf7goADPVSzpYMIr35Z7Q Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4F6A71955D4E; Wed, 4 Dec 2024 07:48:45 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C67ED3000197; Wed, 4 Dec 2024 07:48:42 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 21/39] rxrpc: Display stats about jumbo packets transmitted and received Date: Wed, 4 Dec 2024 07:46:49 +0000 Message-ID: <20241204074710.990092-22-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org In /proc/net/rxrpc/stats, display statistics about the numbers of different sizes of jumbo packets transmitted and received, showing counts for 1 subpacket (ie. a non-jumbo packet), 2 subpackets, 3, ... to 8 and then 9+. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 2 ++ net/rxrpc/input.c | 6 +++++- net/rxrpc/output.c | 5 ++++- net/rxrpc/proc.c | 26 ++++++++++++++++++++++++++ 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 3e57cef7385f..840293f913a3 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -111,6 +111,8 @@ struct rxrpc_net { atomic_t stat_tx_ack_skip; atomic_t stat_tx_acks[256]; atomic_t stat_rx_acks[256]; + atomic_t stat_tx_jumbo[10]; + atomic_t stat_rx_jumbo[10]; atomic_t stat_why_req_ack[8]; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index b89fd0dee324..8d7ab4b9d7d0 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -568,7 +568,7 @@ static bool rxrpc_input_split_jumbo(struct rxrpc_call *call, struct sk_buff *skb unsigned int offset = sizeof(struct rxrpc_wire_header); unsigned int len = skb->len - offset; bool notify = false; - int ack_reason = 0; + int ack_reason = 0, count = 1, stat_ix; while (sp->hdr.flags & RXRPC_JUMBO_PACKET) { if (len < RXRPC_JUMBO_SUBPKTLEN) @@ -597,12 +597,16 @@ static bool rxrpc_input_split_jumbo(struct rxrpc_call *call, struct sk_buff *skb sp->hdr.serial++; offset += RXRPC_JUMBO_SUBPKTLEN; len -= RXRPC_JUMBO_SUBPKTLEN; + count++; } sp->offset = offset; sp->len = len; rxrpc_input_data_one(call, skb, ¬ify, &ack_serial, &ack_reason); + stat_ix = umin(count, ARRAY_SIZE(call->rxnet->stat_rx_jumbo)) - 1; + atomic_inc(&call->rxnet->stat_rx_jumbo[stat_ix]); + if (ack_reason > 0) { rxrpc_send_ACK(call, ack_reason, ack_serial, rxrpc_propose_ack_input_data); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index c2044d593237..3886777d1bb6 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -552,10 +552,13 @@ void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req rxrpc_seq_t seq = req->seq; size_t len; bool new_call = test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags); - int ret; + int ret, stat_ix; _enter("%x,%x-%x", tq->qbase, seq, seq + req->n - 1); + stat_ix = umin(req->n, ARRAY_SIZE(call->rxnet->stat_tx_jumbo)) - 1; + atomic_inc(&call->rxnet->stat_tx_jumbo[stat_ix]); + len = rxrpc_prepare_data_packet(call, req); txb = tq->bufs[seq & RXRPC_TXQ_MASK]; diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index a8325b8e33c2..5f974ec13d69 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -529,6 +529,30 @@ int rxrpc_stats_show(struct seq_file *seq, void *v) atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_retrans]), atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_slow_start]), atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_small_txwin])); + seq_printf(seq, + "Jumbo-Tx : %u,%u,%u,%u,%u,%u,%u,%u,%u,%u\n", + atomic_read(&rxnet->stat_tx_jumbo[0]), + atomic_read(&rxnet->stat_tx_jumbo[1]), + atomic_read(&rxnet->stat_tx_jumbo[2]), + atomic_read(&rxnet->stat_tx_jumbo[3]), + atomic_read(&rxnet->stat_tx_jumbo[4]), + atomic_read(&rxnet->stat_tx_jumbo[5]), + atomic_read(&rxnet->stat_tx_jumbo[6]), + atomic_read(&rxnet->stat_tx_jumbo[7]), + atomic_read(&rxnet->stat_tx_jumbo[8]), + atomic_read(&rxnet->stat_tx_jumbo[9])); + seq_printf(seq, + "Jumbo-Rx : %u,%u,%u,%u,%u,%u,%u,%u,%u,%u\n", + atomic_read(&rxnet->stat_rx_jumbo[0]), + atomic_read(&rxnet->stat_rx_jumbo[1]), + atomic_read(&rxnet->stat_rx_jumbo[2]), + atomic_read(&rxnet->stat_rx_jumbo[3]), + atomic_read(&rxnet->stat_rx_jumbo[4]), + atomic_read(&rxnet->stat_rx_jumbo[5]), + atomic_read(&rxnet->stat_rx_jumbo[6]), + atomic_read(&rxnet->stat_rx_jumbo[7]), + atomic_read(&rxnet->stat_rx_jumbo[8]), + atomic_read(&rxnet->stat_rx_jumbo[9])); seq_printf(seq, "Buffers : txb=%u rxb=%u\n", atomic_read(&rxrpc_nr_txbuf), @@ -566,6 +590,8 @@ int rxrpc_stats_clear(struct file *file, char *buf, size_t size) atomic_set(&rxnet->stat_tx_ack_skip, 0); memset(&rxnet->stat_tx_acks, 0, sizeof(rxnet->stat_tx_acks)); memset(&rxnet->stat_rx_acks, 0, sizeof(rxnet->stat_rx_acks)); + memset(&rxnet->stat_tx_jumbo, 0, sizeof(rxnet->stat_tx_jumbo)); + memset(&rxnet->stat_rx_jumbo, 0, sizeof(rxnet->stat_rx_jumbo)); memset(&rxnet->stat_why_req_ack, 0, sizeof(rxnet->stat_why_req_ack)); From patchwork Wed Dec 4 07:46:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893339 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EB351B4F15 for ; Wed, 4 Dec 2024 07:48:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298538; cv=none; b=QCZcNEulPTgJk+Sj3016LPYxtOxuleC/x4R968BhGHZPg9t4VGgCiA6S3CR2IPhVyOaOxMV43579zzwLgedLpoKllGm4bh9pQUboc2uIRYND7tebMIXbgBvqDrt9zhbL6+LclxdHS2LxDEV7GVw2cIIQkXkW8VfXl+Dlg1VK81Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298538; c=relaxed/simple; bh=FtBrczHaMc/+Xmt21XTkz8bRF4EgyLsJeB8rLGFY09o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ipJ/ScGK3qFjQzVvsPXJCE6yZAkQpecL5Ah2094La/9Egv6hSHHy9BvccGmRcFC/dcO4O0CbJODC7vY+3KGCW8T53TyT8h56zA51x7cjOljT2ANPF1o8gt5Np/KE0l1iQLZzLQn/vXWWa/BKEmLTxXqAFzPN4jrG7T6lxBzSFg0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NN5qWC1t; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NN5qWC1t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lCZ9DduAzK5mqGHBqk31bfec/XJncLntfqLDlK4Wvig=; b=NN5qWC1tk/xzfAaeMgwOiv+JWwHtbIdv/hHZQ3oW+gSbwReU/ad245zAobjkz86FWo/8AY APXbnEcapxg6C+Bn7TFTl5+tSr7c3MRXSDI/aTIUaZndhER/CoIuMhTrcVqjEAOgw+uUa2 yda6W4jKdRFTRpcYAIPiJBy9uDtOQYk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-FzbW-mTAN3mjxA1Z1oTOrw-1; Wed, 04 Dec 2024 02:48:50 -0500 X-MC-Unique: FzbW-mTAN3mjxA1Z1oTOrw-1 X-Mimecast-MFC-AGG-ID: FzbW-mTAN3mjxA1Z1oTOrw Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2EDFD1956060; Wed, 4 Dec 2024 07:48:49 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A15C41956054; Wed, 4 Dec 2024 07:48:46 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 22/39] rxrpc: Adjust names and types of congestion-related fields Date: Wed, 4 Dec 2024 07:46:50 +0000 Message-ID: <20241204074710.990092-23-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Adjust some of the names of fields and constants to make them look a bit more like the TCP congestion symbol names, such as flight_size -> in_flight and congest_mode to ca_state. Move the persistent congestion-related fields from the rxrpc_ack_summary struct into the rxrpc_call struct rather than copying them out and back in again. The rxrpc_congest tracepoint can fetch them from the call struct. Rename the counters for soft acks and nacks to have an 's' on the front to reflect the softness, e.g. nr_acks -> nr_sacks. Make fields counting numbers of packets or numbers of acks u16 rather than u8 to allow for windows of up to 8192 DATA packets in flight in future. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 58 ++++++++------ net/rxrpc/ar-internal.h | 51 ++++++------ net/rxrpc/conn_client.c | 4 +- net/rxrpc/input.c | 151 ++++++++++++++++------------------- net/rxrpc/output.c | 2 +- 5 files changed, 132 insertions(+), 134 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 91108e0de3af..d47b8235fad3 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -378,11 +378,11 @@ EM(rxrpc_propose_ack_rx_idle, "RxIdle ") \ E_(rxrpc_propose_ack_terminal_ack, "ClTerm ") -#define rxrpc_congest_modes \ - EM(RXRPC_CALL_CONGEST_AVOIDANCE, "CongAvoid") \ - EM(RXRPC_CALL_FAST_RETRANSMIT, "FastReTx ") \ - EM(RXRPC_CALL_PACKET_LOSS, "PktLoss ") \ - E_(RXRPC_CALL_SLOW_START, "SlowStart") +#define rxrpc_ca_states \ + EM(RXRPC_CA_CONGEST_AVOIDANCE, "CongAvoid") \ + EM(RXRPC_CA_FAST_RETRANSMIT, "FastReTx ") \ + EM(RXRPC_CA_PACKET_LOSS, "PktLoss ") \ + E_(RXRPC_CA_SLOW_START, "SlowStart") #define rxrpc_congest_changes \ EM(rxrpc_cong_begin_retransmission, " Retrans") \ @@ -550,11 +550,11 @@ enum rxrpc_txqueue_trace { rxrpc_txqueue_traces } __mode(byte); rxrpc_abort_reasons; rxrpc_bundle_traces; +rxrpc_ca_states; rxrpc_call_poke_traces; rxrpc_call_traces; rxrpc_client_traces; rxrpc_congest_changes; -rxrpc_congest_modes; rxrpc_conn_traces; rxrpc_local_traces; rxrpc_pmtud_reduce_traces; @@ -1688,27 +1688,39 @@ TRACE_EVENT(rxrpc_retransmit, TRACE_EVENT(rxrpc_congest, TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, - rxrpc_serial_t ack_serial, enum rxrpc_congest_change change), + rxrpc_serial_t ack_serial), - TP_ARGS(call, summary, ack_serial, change), + TP_ARGS(call, summary, ack_serial), TP_STRUCT__entry( __field(unsigned int, call) - __field(enum rxrpc_congest_change, change) + __field(enum rxrpc_ca_state, ca_state) __field(rxrpc_seq_t, hard_ack) __field(rxrpc_seq_t, top) __field(rxrpc_seq_t, lowest_nak) __field(rxrpc_serial_t, ack_serial) + __field(u16, nr_sacks) + __field(u16, nr_snacks) + __field(u16, cwnd) + __field(u16, ssthresh) + __field(u16, cumul_acks) + __field(u16, dup_acks) __field_struct(struct rxrpc_ack_summary, sum) ), TP_fast_assign( __entry->call = call->debug_id; - __entry->change = change; + __entry->ca_state = call->cong_ca_state; __entry->hard_ack = call->acks_hard_ack; __entry->top = call->tx_top; __entry->lowest_nak = call->acks_lowest_nak; __entry->ack_serial = ack_serial; + __entry->nr_sacks = call->acks_nr_sacks; + __entry->nr_snacks = call->acks_nr_snacks; + __entry->cwnd = call->cong_cwnd; + __entry->ssthresh = call->cong_ssthresh; + __entry->cumul_acks = call->cong_cumul_acks; + __entry->dup_acks = call->cong_dup_acks; memcpy(&__entry->sum, summary, sizeof(__entry->sum)); ), @@ -1717,17 +1729,17 @@ TRACE_EVENT(rxrpc_congest, __entry->ack_serial, __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names), __entry->hard_ack, - __print_symbolic(__entry->sum.mode, rxrpc_congest_modes), - __entry->sum.cwnd, - __entry->sum.ssthresh, - __entry->sum.nr_acks, __entry->sum.nr_retained_nacks, - __entry->sum.nr_new_acks, - __entry->sum.nr_new_nacks, + __print_symbolic(__entry->ca_state, rxrpc_ca_states), + __entry->cwnd, + __entry->ssthresh, + __entry->nr_sacks, __entry->sum.nr_retained_snacks, + __entry->sum.nr_new_sacks, + __entry->sum.nr_new_snacks, __entry->top - __entry->hard_ack, - __entry->sum.cumulative_acks, - __entry->sum.dup_acks, - __entry->lowest_nak, __entry->sum.new_low_nack ? "!" : "", - __print_symbolic(__entry->change, rxrpc_congest_changes), + __entry->cumul_acks, + __entry->dup_acks, + __entry->lowest_nak, __entry->sum.new_low_snack ? "!" : "", + __print_symbolic(__entry->sum.change, rxrpc_congest_changes), __entry->sum.retrans_timeo ? " rTxTo" : "") ); @@ -1738,7 +1750,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, TP_STRUCT__entry( __field(unsigned int, call) - __field(enum rxrpc_congest_mode, mode) + __field(enum rxrpc_ca_state, ca_state) __field(unsigned short, cwnd) __field(unsigned short, extra) __field(rxrpc_seq_t, hard_ack) @@ -1749,7 +1761,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, TP_fast_assign( __entry->call = call->debug_id; - __entry->mode = call->cong_mode; + __entry->ca_state = call->cong_ca_state; __entry->cwnd = call->cong_cwnd; __entry->extra = call->cong_extra; __entry->hard_ack = call->acks_hard_ack; @@ -1761,7 +1773,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu d=%u", __entry->call, __entry->hard_ack, - __print_symbolic(__entry->mode, rxrpc_congest_modes), + __print_symbolic(__entry->ca_state, rxrpc_ca_states), __entry->cwnd, __entry->extra, __entry->prepared, diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 840293f913a3..f6e6b2ab6c2a 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -623,13 +623,13 @@ enum rxrpc_call_state { /* * Call Tx congestion management modes. */ -enum rxrpc_congest_mode { - RXRPC_CALL_SLOW_START, - RXRPC_CALL_CONGEST_AVOIDANCE, - RXRPC_CALL_PACKET_LOSS, - RXRPC_CALL_FAST_RETRANSMIT, - NR__RXRPC_CONGEST_MODES -}; +enum rxrpc_ca_state { + RXRPC_CA_SLOW_START, + RXRPC_CA_CONGEST_AVOIDANCE, + RXRPC_CA_PACKET_LOSS, + RXRPC_CA_FAST_RETRANSMIT, + NR__RXRPC_CA_STATES +} __mode(byte); /* * RxRPC call definition @@ -727,12 +727,12 @@ struct rxrpc_call { */ #define RXRPC_TX_SMSS RXRPC_JUMBO_DATALEN #define RXRPC_MIN_CWND 4 - u8 cong_cwnd; /* Congestion window size */ + enum rxrpc_ca_state cong_ca_state; /* Congestion control state */ u8 cong_extra; /* Extra to send for congestion management */ - u8 cong_ssthresh; /* Slow-start threshold */ - enum rxrpc_congest_mode cong_mode:8; /* Congestion management mode */ - u8 cong_dup_acks; /* Count of ACKs showing missing packets */ - u8 cong_cumul_acks; /* Cumulative ACK count */ + u16 cong_cwnd; /* Congestion window size */ + u16 cong_ssthresh; /* Slow-start threshold */ + u16 cong_dup_acks; /* Count of ACKs showing missing packets */ + u16 cong_cumul_acks; /* Cumulative ACK count */ ktime_t cong_tstamp; /* Last time cwnd was changed */ struct sk_buff *cong_last_nack; /* Last ACK with nacks received */ @@ -763,27 +763,24 @@ struct rxrpc_call { rxrpc_seq_t acks_prev_seq; /* Highest previousPacket received */ rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer (or ==tx_hard_ack) */ rxrpc_serial_t acks_highest_serial; /* Highest serial number ACK'd */ + unsigned short acks_nr_sacks; /* Number of soft acks recorded */ + unsigned short acks_nr_snacks; /* Number of soft nacks recorded */ }; /* * Summary of a new ACK and the changes it made to the Tx buffer packet states. */ struct rxrpc_ack_summary { - u16 nr_acks; /* Number of ACKs in packet */ - u16 nr_new_acks; /* Number of new ACKs in packet */ - u16 nr_new_nacks; /* Number of new nacks in packet */ - u16 nr_retained_nacks; /* Number of nacks retained between ACKs */ - u8 ack_reason; - bool saw_nacks; /* Saw NACKs in packet */ - bool new_low_nack; /* T if new low NACK found */ - bool retrans_timeo; /* T if reTx due to timeout happened */ - u8 flight_size; /* Number of unreceived transmissions */ - /* Place to stash values for tracing */ - enum rxrpc_congest_mode mode:8; - u8 cwnd; - u8 ssthresh; - u8 dup_acks; - u8 cumulative_acks; + u16 in_flight; /* Number of unreceived transmissions */ + u16 nr_new_hacks; /* Number of rotated new ACKs */ + u16 nr_new_sacks; /* Number of new soft ACKs in packet */ + u16 nr_new_snacks; /* Number of new soft nacks in packet */ + u16 nr_retained_snacks; /* Number of nacks retained between ACKs */ + u8 ack_reason; + bool saw_snacks:1; /* T if we saw a soft NACK */ + bool new_low_snack:1; /* T if new low soft NACK found */ + bool retrans_timeo:1; /* T if reTx due to timeout happened */ + u8 /*enum rxrpc_congest_change*/ change; }; /* diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 706631e6ac2f..5f76bd90567c 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -437,9 +437,9 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn, call->dest_srx.srx_service = conn->service_id; call->cong_ssthresh = call->peer->cong_ssthresh; if (call->cong_cwnd >= call->cong_ssthresh) - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; else - call->cong_mode = RXRPC_CALL_SLOW_START; + call->cong_ca_state = RXRPC_CA_SLOW_START; chan->call_id = call_id; chan->call_debug_id = call->debug_id; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 8d7ab4b9d7d0..c25d816aafee 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -34,49 +34,41 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, rxrpc_serial_t acked_serial) { - enum rxrpc_congest_change change = rxrpc_cong_no_change; - unsigned int cumulative_acks = call->cong_cumul_acks; - unsigned int cwnd = call->cong_cwnd; bool resend = false; - summary->flight_size = - (call->tx_top - call->tx_bottom) - summary->nr_acks; + summary->change = rxrpc_cong_no_change; + summary->in_flight = (call->tx_top - call->tx_bottom) - call->acks_nr_sacks; if (test_and_clear_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags)) { summary->retrans_timeo = true; - call->cong_ssthresh = umax(summary->flight_size / 2, 2); - cwnd = 1; - if (cwnd >= call->cong_ssthresh && - call->cong_mode == RXRPC_CALL_SLOW_START) { - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; + call->cong_ssthresh = umax(summary->in_flight / 2, 2); + call->cong_cwnd = 1; + if (call->cong_cwnd >= call->cong_ssthresh && + call->cong_ca_state == RXRPC_CA_SLOW_START) { + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; call->cong_tstamp = skb->tstamp; - cumulative_acks = 0; + call->cong_cumul_acks = 0; } } - cumulative_acks += summary->nr_new_acks; - if (cumulative_acks > 255) - cumulative_acks = 255; + call->cong_cumul_acks += summary->nr_new_sacks; + if (call->cong_cumul_acks > 255) + call->cong_cumul_acks = 255; - summary->cwnd = call->cong_cwnd; - summary->ssthresh = call->cong_ssthresh; - summary->cumulative_acks = cumulative_acks; - summary->dup_acks = call->cong_dup_acks; - - switch (call->cong_mode) { - case RXRPC_CALL_SLOW_START: - if (summary->saw_nacks) + switch (call->cong_ca_state) { + case RXRPC_CA_SLOW_START: + if (summary->saw_snacks) goto packet_loss_detected; - if (summary->cumulative_acks > 0) - cwnd += 1; - if (cwnd >= call->cong_ssthresh) { - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; + if (call->cong_cumul_acks > 0) + call->cong_cwnd += 1; + if (call->cong_cwnd >= call->cong_ssthresh) { + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; call->cong_tstamp = skb->tstamp; } goto out; - case RXRPC_CALL_CONGEST_AVOIDANCE: - if (summary->saw_nacks) + case RXRPC_CA_CONGEST_AVOIDANCE: + if (summary->saw_snacks) goto packet_loss_detected; /* We analyse the number of packets that get ACK'd per RTT @@ -88,18 +80,18 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, ktime_add_us(call->cong_tstamp, call->peer->srtt_us >> 3))) goto out_no_clear_ca; - change = rxrpc_cong_rtt_window_end; + summary->change = rxrpc_cong_rtt_window_end; call->cong_tstamp = skb->tstamp; - if (cumulative_acks >= cwnd) - cwnd++; + if (call->cong_cumul_acks >= call->cong_cwnd) + call->cong_cwnd++; goto out; - case RXRPC_CALL_PACKET_LOSS: - if (!summary->saw_nacks) + case RXRPC_CA_PACKET_LOSS: + if (!summary->saw_snacks) goto resume_normality; - if (summary->new_low_nack) { - change = rxrpc_cong_new_low_nack; + if (summary->new_low_snack) { + summary->change = rxrpc_cong_new_low_nack; call->cong_dup_acks = 1; if (call->cong_extra > 1) call->cong_extra = 1; @@ -110,29 +102,29 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, if (call->cong_dup_acks < 3) goto send_extra_data; - change = rxrpc_cong_begin_retransmission; - call->cong_mode = RXRPC_CALL_FAST_RETRANSMIT; - call->cong_ssthresh = umax(summary->flight_size / 2, 2); - cwnd = call->cong_ssthresh + 3; + summary->change = rxrpc_cong_begin_retransmission; + call->cong_ca_state = RXRPC_CA_FAST_RETRANSMIT; + call->cong_ssthresh = umax(summary->in_flight / 2, 2); + call->cong_cwnd = call->cong_ssthresh + 3; call->cong_extra = 0; call->cong_dup_acks = 0; resend = true; goto out; - case RXRPC_CALL_FAST_RETRANSMIT: - if (!summary->new_low_nack) { - if (summary->nr_new_acks == 0) - cwnd += 1; + case RXRPC_CA_FAST_RETRANSMIT: + if (!summary->new_low_snack) { + if (summary->nr_new_sacks == 0) + call->cong_cwnd += 1; call->cong_dup_acks++; if (call->cong_dup_acks == 2) { - change = rxrpc_cong_retransmit_again; + summary->change = rxrpc_cong_retransmit_again; call->cong_dup_acks = 0; resend = true; } } else { - change = rxrpc_cong_progress; - cwnd = call->cong_ssthresh; - if (!summary->saw_nacks) + summary->change = rxrpc_cong_progress; + call->cong_cwnd = call->cong_ssthresh; + if (!summary->saw_snacks) goto resume_normality; } goto out; @@ -143,30 +135,27 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, } resume_normality: - change = rxrpc_cong_cleared_nacks; + summary->change = rxrpc_cong_cleared_nacks; call->cong_dup_acks = 0; call->cong_extra = 0; call->cong_tstamp = skb->tstamp; - if (cwnd < call->cong_ssthresh) - call->cong_mode = RXRPC_CALL_SLOW_START; + if (call->cong_cwnd < call->cong_ssthresh) + call->cong_ca_state = RXRPC_CA_SLOW_START; else - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; out: - cumulative_acks = 0; + call->cong_cumul_acks = 0; out_no_clear_ca: - if (cwnd >= RXRPC_TX_MAX_WINDOW) - cwnd = RXRPC_TX_MAX_WINDOW; - call->cong_cwnd = cwnd; - call->cong_cumul_acks = cumulative_acks; - summary->mode = call->cong_mode; - trace_rxrpc_congest(call, summary, acked_serial, change); + if (call->cong_cwnd >= RXRPC_TX_MAX_WINDOW) + call->cong_cwnd = RXRPC_TX_MAX_WINDOW; + trace_rxrpc_congest(call, summary, acked_serial); if (resend) rxrpc_resend(call, skb); return; packet_loss_detected: - change = rxrpc_cong_saw_nack; - call->cong_mode = RXRPC_CALL_PACKET_LOSS; + summary->change = rxrpc_cong_saw_nack; + call->cong_ca_state = RXRPC_CA_PACKET_LOSS; call->cong_dup_acks = 0; goto send_extra_data; @@ -175,7 +164,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, * state. */ if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) || - summary->nr_acks != call->tx_top - call->tx_bottom) { + call->acks_nr_sacks != call->tx_top - call->tx_bottom) { call->cong_extra++; wake_up(&call->waitq); } @@ -189,8 +178,8 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) { ktime_t rtt, now; - if (call->cong_mode != RXRPC_CALL_SLOW_START && - call->cong_mode != RXRPC_CALL_CONGEST_AVOIDANCE) + if (call->cong_ca_state != RXRPC_CA_SLOW_START && + call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE) return; if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY) return; @@ -203,7 +192,7 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) trace_rxrpc_reset_cwnd(call, now); rxrpc_inc_stat(call->rxnet, stat_tx_data_cwnd_reset); call->tx_last_sent = now; - call->cong_mode = RXRPC_CALL_SLOW_START; + call->cong_ca_state = RXRPC_CA_SLOW_START; call->cong_ssthresh = umax(call->cong_ssthresh, call->cong_cwnd * 3 / 4); call->cong_cwnd = umax(call->cong_cwnd / 2, RXRPC_MIN_CWND); } @@ -282,7 +271,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, if (call->acks_lowest_nak == call->tx_bottom) { call->acks_lowest_nak = to; } else if (after(to, call->acks_lowest_nak)) { - summary->new_low_nack = true; + summary->new_low_snack = true; call->acks_lowest_nak = to; } @@ -795,11 +784,11 @@ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); if (after_eq(seq, old_seq + sp->ack.nr_acks)) { - summary->nr_new_acks += sp->ack.nr_nacks; - summary->nr_new_acks += seq - (old_seq + sp->ack.nr_acks); - summary->nr_retained_nacks = 0; + summary->nr_new_sacks += sp->ack.nr_nacks; + summary->nr_new_sacks += seq - (old_seq + sp->ack.nr_acks); + summary->nr_retained_snacks = 0; } else if (seq == old_seq) { - summary->nr_retained_nacks = sp->ack.nr_nacks; + summary->nr_retained_snacks = sp->ack.nr_nacks; } else { for (i = 0; i < sp->ack.nr_acks; i++) { if (acks[i] == RXRPC_ACK_TYPE_NACK) { @@ -810,8 +799,8 @@ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, } } - summary->nr_new_acks += new_acks; - summary->nr_retained_nacks = retained_nacks; + summary->nr_new_sacks += new_acks; + summary->nr_retained_snacks = retained_nacks; } return old_seq + sp->ack.nr_acks - 1; @@ -840,16 +829,16 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, for (i = 0; i < sp->ack.nr_acks; i++) { seq++; if (acks[i] == RXRPC_ACK_TYPE_ACK) { - summary->nr_acks++; + call->acks_nr_sacks++; if (after(seq, since)) - summary->nr_new_acks++; + summary->nr_new_sacks++; } else { - summary->saw_nacks = true; + summary->saw_snacks = true; if (before_eq(seq, since)) { /* Overlap with previous ACK */ old_nacks++; } else { - summary->nr_new_nacks++; + summary->nr_new_snacks++; sp->ack.nr_nacks++; } @@ -860,7 +849,7 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, if (lowest_nak != call->acks_lowest_nak) { call->acks_lowest_nak = lowest_nak; - summary->new_low_nack = true; + summary->new_low_snack = true; } /* We *can* have more nacks than we did - the peer is permitted to drop @@ -868,9 +857,9 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, * possible for the nack distribution to change whilst the number of * nacks stays the same or goes down. */ - if (old_nacks < summary->nr_retained_nacks) - summary->nr_new_acks += summary->nr_retained_nacks - old_nacks; - summary->nr_retained_nacks = old_nacks; + if (old_nacks < summary->nr_retained_snacks) + summary->nr_new_sacks += summary->nr_retained_snacks - old_nacks; + summary->nr_retained_snacks = old_nacks; } /* @@ -996,7 +985,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); call->cong_last_nack = NULL; } else { - summary.nr_new_acks = hard_ack - call->acks_hard_ack; + summary.nr_new_sacks = hard_ack - call->acks_hard_ack; call->acks_lowest_nak = hard_ack + nr_acks; since = hard_ack; } @@ -1054,7 +1043,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) } if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) && - summary.nr_acks == call->tx_top - hard_ack && + call->acks_nr_sacks == call->tx_top - hard_ack && rxrpc_is_client_call(call)) rxrpc_propose_ping(call, ack_serial, rxrpc_propose_ack_ping_for_lost_reply); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 3886777d1bb6..7ed928b6f0e1 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -419,7 +419,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, why = rxrpc_reqack_ack_lost; else if (txb->flags & RXRPC_TXBUF_RESENT) why = rxrpc_reqack_retrans; - else if (call->cong_mode == RXRPC_CALL_SLOW_START && call->cong_cwnd <= 2) + else if (call->cong_ca_state == RXRPC_CA_SLOW_START && call->cong_cwnd <= 2) why = rxrpc_reqack_slow_start; else if (call->tx_winsize <= 2) why = rxrpc_reqack_small_txwin; From patchwork Wed Dec 4 07:46:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893340 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB51E1B4F2B for ; Wed, 4 Dec 2024 07:48:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298539; cv=none; b=iiR4nR+JtURdksHwx+sEGXuOo0KGIrbmSeg+BCkLvQyvp+CjRJ4ki1UdAK0E+qExVpDsq/n6qoYMLjx3mcl3U6Iu3AWsgavEu5PjyVPXQJWR03Dudr7Zn/6ZfKHTUXOp2Vo7DgMHIFAcOIVohYb+YzUKf5FL8ttrslGsTnVQ+dE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298539; c=relaxed/simple; bh=VAliRorX+ta7PgvEOLO2MH9IkUeE/arKRtXPlzVv8II=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IMFJIPV0gL1irEM0COCrknZ7ymGXCAyfyjFNIGBLHp6q9O0DvhERE2d0KEbqjdE3zy2gKdkj/lKQmZM2ILp6NOXjrJBybS+qqj6DbrGBpnMad31KTx/Fcq3fsu8dh2DxSz5z03JpclxnUHFC9wcnNNs3Wt4NJhZDN7tcFRyJErQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iT4TYs2z; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iT4TYs2z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298536; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h2FDDaF9XMrWQ8p6p5PDq2e3W8Wy7Hq7/tRJ0aFTjno=; b=iT4TYs2zKBdbAdXJ/pkUE524WNNNxR0C/k21Sqn0sEodDIrMbs1CthNMAgwPOR3pCLFrDN oVB6fcJEDD8XVd8xz/T/ccTD7icmvUvSJTemuYI7EPW2PAwZ8O2kGXdiF/tdv+L4g6svwb 5j/cZtJhrEBVfbnMEk4zK8KCCBqSkBk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-391-LS3GYMKXPqC8A6iwMy9Y_w-1; Wed, 04 Dec 2024 02:48:54 -0500 X-MC-Unique: LS3GYMKXPqC8A6iwMy9Y_w-1 X-Mimecast-MFC-AGG-ID: LS3GYMKXPqC8A6iwMy9Y_w Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 746E81954ADC; Wed, 4 Dec 2024 07:48:53 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9984F3000197; Wed, 4 Dec 2024 07:48:50 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 23/39] rxrpc: Use the new rxrpc_tx_queue struct to more efficiently process ACKs Date: Wed, 4 Dec 2024 07:46:51 +0000 Message-ID: <20241204074710.990092-24-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org With the change in the structure of the transmission buffer to store buffers in bunches of 32 or 64 (BITS_PER_LONG) we can place sets of per-buffer flags into the rxrpc_tx_queue struct rather than storing them in rxrpc_tx_buf, thereby vastly increasing efficiency when assessing the SACK table in an ACK packet. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 86 ++++++++++-- net/rxrpc/ar-internal.h | 23 +++- net/rxrpc/call_event.c | 181 ++++++++++++------------- net/rxrpc/call_object.c | 1 - net/rxrpc/input.c | 252 ++++++++++++++++++++++------------- net/rxrpc/output.c | 10 +- net/rxrpc/sendmsg.c | 3 + 7 files changed, 352 insertions(+), 204 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index d47b8235fad3..609522a5bd0f 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -132,7 +132,6 @@ EM(rxrpc_skb_get_call_rx, "GET call-rx ") \ EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \ EM(rxrpc_skb_get_conn_work, "GET conn-work") \ - EM(rxrpc_skb_get_last_nack, "GET last-nack") \ EM(rxrpc_skb_get_local_work, "GET locl-work") \ EM(rxrpc_skb_get_reject_work, "GET rej-work ") \ EM(rxrpc_skb_get_to_recvmsg, "GET to-recv ") \ @@ -147,7 +146,6 @@ EM(rxrpc_skb_put_error_report, "PUT error-rep") \ EM(rxrpc_skb_put_input, "PUT input ") \ EM(rxrpc_skb_put_jumbo_subpacket, "PUT jumbo-sub") \ - EM(rxrpc_skb_put_last_nack, "PUT last-nack") \ EM(rxrpc_skb_put_purge, "PUT purge ") \ EM(rxrpc_skb_put_rotate, "PUT rotate ") \ EM(rxrpc_skb_put_unknown, "PUT unknown ") \ @@ -499,6 +497,11 @@ EM(rxrpc_pmtud_reduce_icmp, "Icmp ") \ E_(rxrpc_pmtud_reduce_route, "Route") +#define rxrpc_rotate_traces \ + EM(rxrpc_rotate_trace_hack, "hard-ack") \ + EM(rxrpc_rotate_trace_sack, "soft-ack") \ + E_(rxrpc_rotate_trace_snak, "soft-nack") + /* * Generate enums for tracing information. */ @@ -525,6 +528,7 @@ enum rxrpc_propose_ack_trace { rxrpc_propose_ack_traces } __mode(byte); enum rxrpc_receive_trace { rxrpc_receive_traces } __mode(byte); enum rxrpc_recvmsg_trace { rxrpc_recvmsg_traces } __mode(byte); enum rxrpc_req_ack_trace { rxrpc_req_ack_traces } __mode(byte); +enum rxrpc_rotate_trace { rxrpc_rotate_traces } __mode(byte); enum rxrpc_rtt_rx_trace { rxrpc_rtt_rx_traces } __mode(byte); enum rxrpc_rtt_tx_trace { rxrpc_rtt_tx_traces } __mode(byte); enum rxrpc_sack_trace { rxrpc_sack_traces } __mode(byte); @@ -562,6 +566,7 @@ rxrpc_propose_ack_traces; rxrpc_receive_traces; rxrpc_recvmsg_traces; rxrpc_req_ack_traces; +rxrpc_rotate_traces; rxrpc_rtt_rx_traces; rxrpc_rtt_tx_traces; rxrpc_sack_traces; @@ -1667,6 +1672,7 @@ TRACE_EVENT(rxrpc_retransmit, TP_STRUCT__entry( __field(unsigned int, call) + __field(unsigned int, qbase) __field(rxrpc_seq_t, seq) __field(rxrpc_serial_t, serial) __field(ktime_t, expiry) @@ -1674,13 +1680,15 @@ TRACE_EVENT(rxrpc_retransmit, TP_fast_assign( __entry->call = call->debug_id; + __entry->qbase = req->tq->qbase; __entry->seq = req->seq; __entry->serial = txb->serial; __entry->expiry = expiry; ), - TP_printk("c=%08x q=%x r=%x xp=%lld", + TP_printk("c=%08x tq=%x q=%x r=%x xp=%lld", __entry->call, + __entry->qbase, __entry->seq, __entry->serial, ktime_to_us(__entry->expiry)) @@ -1724,7 +1732,7 @@ TRACE_EVENT(rxrpc_congest, memcpy(&__entry->sum, summary, sizeof(__entry->sum)); ), - TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u,%u b=%u u=%u d=%u l=%x%s%s%s", + TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u A=%u+%u/%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s", __entry->call, __entry->ack_serial, __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names), @@ -1732,9 +1740,9 @@ TRACE_EVENT(rxrpc_congest, __print_symbolic(__entry->ca_state, rxrpc_ca_states), __entry->cwnd, __entry->ssthresh, - __entry->nr_sacks, __entry->sum.nr_retained_snacks, - __entry->sum.nr_new_sacks, - __entry->sum.nr_new_snacks, + __entry->nr_sacks, __entry->sum.nr_new_sacks, + __entry->nr_snacks, __entry->sum.nr_new_snacks, + __entry->sum.nr_new_hacks, __entry->top - __entry->hard_ack, __entry->cumul_acks, __entry->dup_acks, @@ -1850,10 +1858,36 @@ TRACE_EVENT(rxrpc_connect_call, &__entry->srx.transport) ); +TRACE_EVENT(rxrpc_apply_acks, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq), + + TP_ARGS(call, tq), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(unsigned int, nr_rep) + __field(rxrpc_seq_t, qbase) + __field(unsigned long, acks) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->qbase = tq->qbase; + __entry->acks = tq->segment_acked; + __entry->nr_rep = tq->nr_reported_acks; + ), + + TP_printk("c=%08x tq=%x acks=%016lx rep=%u", + __entry->call, + __entry->qbase, + __entry->acks, + __entry->nr_rep) + ); + TRACE_EVENT(rxrpc_resend, - TP_PROTO(struct rxrpc_call *call, struct sk_buff *ack), + TP_PROTO(struct rxrpc_call *call, rxrpc_serial_t ack_serial), - TP_ARGS(call, ack), + TP_ARGS(call, ack_serial), TP_STRUCT__entry( __field(unsigned int, call) @@ -1863,11 +1897,10 @@ TRACE_EVENT(rxrpc_resend, ), TP_fast_assign( - struct rxrpc_skb_priv *sp = ack ? rxrpc_skb(ack) : NULL; __entry->call = call->debug_id; __entry->seq = call->acks_hard_ack; __entry->transmitted = call->tx_transmitted; - __entry->ack_serial = sp ? sp->hdr.serial : 0; + __entry->ack_serial = ack_serial; ), TP_printk("c=%08x r=%x q=%x tq=%x", @@ -1877,6 +1910,37 @@ TRACE_EVENT(rxrpc_resend, __entry->transmitted) ); +TRACE_EVENT(rxrpc_rotate, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, + struct rxrpc_ack_summary *summary, rxrpc_seq_t seq, + enum rxrpc_rotate_trace trace), + + TP_ARGS(call, tq, summary, seq, trace), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, qbase) + __field(rxrpc_seq_t, seq) + __field(unsigned int, nr_rep) + __field(enum rxrpc_rotate_trace, trace) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->qbase = tq->qbase; + __entry->seq = seq; + __entry->nr_rep = tq->nr_reported_acks; + __entry->trace = trace; + ), + + TP_printk("c=%08x tq=%x q=%x nr=%x %s", + __entry->call, + __entry->qbase, + __entry->seq, + __entry->nr_rep, + __print_symbolic(__entry->trace, rxrpc_rotate_traces)) + ); + TRACE_EVENT(rxrpc_rx_icmp, TP_PROTO(struct rxrpc_peer *peer, struct sock_extended_err *ee, struct sockaddr_rxrpc *srx), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index f6e6b2ab6c2a..9a70f0b86570 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -214,9 +214,8 @@ struct rxrpc_skb_priv { rxrpc_seq_t first_ack; /* First packet in acks table */ rxrpc_seq_t prev_ack; /* Highest seq seen */ rxrpc_serial_t acked_serial; /* Packet in response to (or 0) */ + u16 nr_acks; /* Number of acks+nacks */ u8 reason; /* Reason for ack */ - u8 nr_acks; /* Number of acks+nacks */ - u8 nr_nacks; /* Number of nacks */ } ack; }; struct rxrpc_host_header hdr; /* RxRPC packet header from this packet */ @@ -734,7 +733,6 @@ struct rxrpc_call { u16 cong_dup_acks; /* Count of ACKs showing missing packets */ u16 cong_cumul_acks; /* Cumulative ACK count */ ktime_t cong_tstamp; /* Last time cwnd was changed */ - struct sk_buff *cong_last_nack; /* Last ACK with nacks received */ /* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ @@ -775,11 +773,10 @@ struct rxrpc_ack_summary { u16 nr_new_hacks; /* Number of rotated new ACKs */ u16 nr_new_sacks; /* Number of new soft ACKs in packet */ u16 nr_new_snacks; /* Number of new soft nacks in packet */ - u16 nr_retained_snacks; /* Number of nacks retained between ACKs */ u8 ack_reason; - bool saw_snacks:1; /* T if we saw a soft NACK */ bool new_low_snack:1; /* T if new low soft NACK found */ bool retrans_timeo:1; /* T if reTx due to timeout happened */ + bool need_retransmit:1; /* T if we need transmission */ u8 /*enum rxrpc_congest_change*/ change; }; @@ -858,6 +855,10 @@ struct rxrpc_txqueue { struct rxrpc_txqueue *next; ktime_t xmit_ts_base; rxrpc_seq_t qbase; + u8 nr_reported_acks; /* Number of segments explicitly acked/nacked */ + unsigned long segment_acked; /* Bit-per-buf: Set if ACK'd */ + unsigned long segment_lost; /* Bit-per-buf: Set if declared lost */ + unsigned long segment_retransmitted; /* Bit-per-buf: Set if retransmitted */ /* The arrays we want to pack into as few cache lines as possible. */ struct { @@ -935,7 +936,7 @@ void rxrpc_propose_ping(struct rxrpc_call *call, u32 serial, enum rxrpc_propose_ack_trace why); void rxrpc_propose_delay_ACK(struct rxrpc_call *, rxrpc_serial_t, enum rxrpc_propose_ack_trace); -void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb); +void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_response); bool rxrpc_input_call_event(struct rxrpc_call *call); @@ -1383,6 +1384,16 @@ static inline bool after_eq(u32 seq1, u32 seq2) return (s32)(seq1 - seq2) >= 0; } +static inline u32 earliest(u32 seq1, u32 seq2) +{ + return before(seq1, seq2) ? seq1 : seq2; +} + +static inline u32 latest(u32 seq1, u32 seq2) +{ + return after(seq1, seq2) ? seq1 : seq2; +} + static inline void rxrpc_queue_rx_call_packet(struct rxrpc_call *call, struct sk_buff *skb) { rxrpc_get_skb(skb, rxrpc_skb_get_call_rx); diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 2311e5c737e8..e25921d39d4d 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -65,9 +65,9 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call) /* * Retransmit one or more packets. */ -static void rxrpc_retransmit_data(struct rxrpc_call *call, +static bool rxrpc_retransmit_data(struct rxrpc_call *call, struct rxrpc_send_data_req *req, - ktime_t rto) + ktime_t rto, bool skip_too_young) { struct rxrpc_txqueue *tq = req->tq; unsigned int ix = req->seq & RXRPC_TXQ_MASK; @@ -78,9 +78,11 @@ static void rxrpc_retransmit_data(struct rxrpc_call *call, xmit_ts = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); resend_at = ktime_add(xmit_ts, rto); - trace_rxrpc_retransmit(call, req, txb, - ktime_sub(resend_at, req->now)); + trace_rxrpc_retransmit(call, req, txb, ktime_sub(resend_at, req->now)); + if (skip_too_young && ktime_after(resend_at, req->now)) + return false; + __set_bit(ix, &tq->segment_retransmitted); txb->flags |= RXRPC_TXBUF_RESENT; rxrpc_send_data_packet(call, req); rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); @@ -89,128 +91,119 @@ static void rxrpc_retransmit_data(struct rxrpc_call *call, req->n = 0; req->did_send = true; req->now = ktime_get_real(); + return true; } /* * Perform retransmission of NAK'd and unack'd packets. */ -void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) +void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_response) { struct rxrpc_send_data_req req = { .now = ktime_get_real(), }; - struct rxrpc_ackpacket *ack = NULL; - struct rxrpc_skb_priv *sp; - struct rxrpc_txqueue *tq; - struct rxrpc_txbuf *txb; - rxrpc_seq_t transmitted = call->tx_transmitted, seq; - ktime_t next_resend = KTIME_MAX, rto = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); - ktime_t resend_at = KTIME_MAX, delay; - bool unacked = false, did_send = false; - unsigned int qix; + struct rxrpc_txqueue *tq = call->tx_queue; + ktime_t lowest_xmit_ts = KTIME_MAX, rto = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); + bool unacked = false; _enter("{%d,%d}", call->tx_bottom, call->tx_top); - if (call->tx_bottom == call->tx_top) - goto no_resend; + if (call->tx_bottom == call->tx_top) { + call->resend_at = KTIME_MAX; + trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); + return; + } - trace_rxrpc_resend(call, ack_skb); - tq = call->tx_queue; - seq = call->tx_bottom; + trace_rxrpc_resend(call, ack_serial); - /* Scan the soft ACK table and resend any explicitly NAK'd packets. */ - if (ack_skb) { - sp = rxrpc_skb(ack_skb); - ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header); + /* Scan the transmission queue, looking for explicitly NAK'd packets. */ + do { + unsigned long naks = ~tq->segment_acked; + rxrpc_seq_t tq_top = tq->qbase + RXRPC_NR_TXQUEUE - 1; - for (int i = 0; i < sp->ack.nr_acks; i++) { - rxrpc_seq_t aseq; + if (after(tq->qbase, call->tx_transmitted)) + break; - if (ack->acks[i] & 1) - continue; - aseq = sp->ack.first_ack + i; - while (after_eq(aseq, tq->qbase + RXRPC_NR_TXQUEUE)) - tq = tq->next; - seq = aseq; - qix = seq - tq->qbase; - txb = tq->bufs[qix]; - if (after(seq, transmitted)) - goto no_further_resend; - - resend_at = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[qix]); - resend_at = ktime_add(resend_at, rto); - if (after(txb->serial, call->acks_highest_serial)) { - if (ktime_after(resend_at, req.now) && - ktime_before(resend_at, next_resend)) - next_resend = resend_at; + if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE) + naks &= (1UL << tq->nr_reported_acks) - 1; + + _debug("retr %16lx %u c=%08x [%x]", + tq->segment_acked, tq->nr_reported_acks, call->debug_id, tq->qbase); + _debug("nack %16lx", naks); + + while (naks) { + unsigned int ix = __ffs(naks); + struct rxrpc_txbuf *txb = tq->bufs[ix]; + + __clear_bit(ix, &naks); + if (after(txb->serial, call->acks_highest_serial)) continue; /* Ack point not yet reached */ - } rxrpc_see_txbuf(txb, rxrpc_txbuf_see_unacked); req.tq = tq; - req.seq = seq; + req.seq = tq->qbase + ix; req.n = 1; - rxrpc_retransmit_data(call, &req, rto); - - if (after_eq(seq, call->tx_top)) - goto no_further_resend; + rxrpc_retransmit_data(call, &req, rto, false); } - } - /* Fast-forward through the Tx queue to the point the peer says it has - * seen. Anything between the soft-ACK table and that point will get - * ACK'd or NACK'd in due course, so don't worry about it here; here we - * need to consider retransmitting anything beyond that point. - */ - seq = call->acks_prev_seq; - if (after_eq(seq, call->tx_transmitted)) - goto no_further_resend; - seq++; - - while (after_eq(seq, tq->qbase + RXRPC_NR_TXQUEUE)) - tq = tq->next; - - while (before_eq(seq, call->tx_transmitted)) { - qix = seq - tq->qbase; - if (qix >= RXRPC_NR_TXQUEUE) { - tq = tq->next; - continue; + /* Anything after the soft-ACK table up to and including + * ack.previousPacket will get ACK'd or NACK'd in due course, + * so don't worry about those here. We do, however, need to + * consider retransmitting anything beyond that point. + */ + if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE && + after(tq_top, call->acks_prev_seq)) { + rxrpc_seq_t start = latest(call->acks_prev_seq, + tq->qbase + tq->nr_reported_acks); + rxrpc_seq_t stop = earliest(tq_top, call->tx_transmitted); + + _debug("unrep %x-%x", start, stop); + for (rxrpc_seq_t seq = start; before(seq, stop); seq++) { + struct rxrpc_txbuf *txb = tq->bufs[seq & RXRPC_TXQ_MASK]; + + if (ping_response && + before(txb->serial, call->acks_highest_serial)) + break; /* Wasn't accounted for by a more recent ping. */ + req.tq = tq; + req.seq = seq; + req.n = 1; + if (rxrpc_retransmit_data(call, &req, rto, true)) + unacked = true; + } } - txb = tq->bufs[qix]; - resend_at = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[qix]); - resend_at = ktime_add(resend_at, rto); - if (ack && ack->reason == RXRPC_ACK_PING_RESPONSE && - before(txb->serial, ntohl(ack->serial))) - goto do_resend; /* Wasn't accounted for by a more recent ping. */ + /* Work out the next retransmission timeout. */ + if (ktime_before(tq->xmit_ts_base, lowest_xmit_ts)) { + unsigned int lowest_us = UINT_MAX; - if (ktime_after(resend_at, req.now)) { - if (ktime_before(resend_at, next_resend)) - next_resend = resend_at; - seq++; - continue; - } + for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) + if (!test_bit(i, &tq->segment_acked) && + tq->segment_xmit_ts[i] < lowest_us) + lowest_us = tq->segment_xmit_ts[i]; + _debug("lowest[%x] %llx %u", tq->qbase, tq->xmit_ts_base, lowest_us); - do_resend: - unacked = true; + if (lowest_us != UINT_MAX) { + ktime_t lowest_ns = ktime_add_us(tq->xmit_ts_base, lowest_us); - req.tq = tq; - req.seq = seq; - req.n = 1; - rxrpc_retransmit_data(call, &req, rto); - seq++; - } + if (ktime_before(lowest_ns, lowest_xmit_ts)) + lowest_xmit_ts = lowest_ns; + } + } + } while ((tq = tq->next)); + + if (lowest_xmit_ts < KTIME_MAX) { + ktime_t delay = rxrpc_get_rto_backoff(call->peer, req.did_send); + ktime_t resend_at = ktime_add(lowest_xmit_ts, delay); -no_further_resend: -no_resend: - if (resend_at < KTIME_MAX) { - delay = rxrpc_get_rto_backoff(call->peer, did_send); - resend_at = ktime_add(resend_at, delay); + _debug("delay %llu %lld", delay, ktime_sub(resend_at, req.now)); + call->resend_at = resend_at; trace_rxrpc_timer_set(call, resend_at - req.now, rxrpc_timer_trace_resend_reset); + } else { + call->resend_at = KTIME_MAX; + trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); } - call->resend_at = resend_at; if (unacked) rxrpc_congestion_timeout(call); @@ -494,7 +487,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) if (resend && __rxrpc_call_state(call) != RXRPC_CALL_CLIENT_RECV_REPLY && !test_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags)) - rxrpc_resend(call, NULL); + rxrpc_resend(call, 0, false); if (test_and_clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags)) rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index a9682b31a4f9..bba058055c97 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -691,7 +691,6 @@ static void rxrpc_destroy_call(struct work_struct *work) del_timer_sync(&call->timer); - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); rxrpc_cleanup_tx_buffers(call); rxrpc_cleanup_rx_buffers(call); rxrpc_put_txbuf(call->tx_pending, rxrpc_txbuf_put_cleaned); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index c25d816aafee..6e7ff133b5aa 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -34,8 +34,6 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, rxrpc_serial_t acked_serial) { - bool resend = false; - summary->change = rxrpc_cong_no_change; summary->in_flight = (call->tx_top - call->tx_bottom) - call->acks_nr_sacks; @@ -52,12 +50,13 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, } call->cong_cumul_acks += summary->nr_new_sacks; + call->cong_cumul_acks += summary->nr_new_hacks; if (call->cong_cumul_acks > 255) call->cong_cumul_acks = 255; switch (call->cong_ca_state) { case RXRPC_CA_SLOW_START: - if (summary->saw_snacks) + if (call->acks_nr_snacks > 0) goto packet_loss_detected; if (call->cong_cumul_acks > 0) call->cong_cwnd += 1; @@ -68,7 +67,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, goto out; case RXRPC_CA_CONGEST_AVOIDANCE: - if (summary->saw_snacks) + if (call->acks_nr_snacks > 0) goto packet_loss_detected; /* We analyse the number of packets that get ACK'd per RTT @@ -87,7 +86,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, goto out; case RXRPC_CA_PACKET_LOSS: - if (!summary->saw_snacks) + if (call->acks_nr_snacks == 0) goto resume_normality; if (summary->new_low_snack) { @@ -108,7 +107,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, call->cong_cwnd = call->cong_ssthresh + 3; call->cong_extra = 0; call->cong_dup_acks = 0; - resend = true; + summary->need_retransmit = true; goto out; case RXRPC_CA_FAST_RETRANSMIT: @@ -119,12 +118,12 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, if (call->cong_dup_acks == 2) { summary->change = rxrpc_cong_retransmit_again; call->cong_dup_acks = 0; - resend = true; + summary->need_retransmit = true; } } else { summary->change = rxrpc_cong_progress; call->cong_cwnd = call->cong_ssthresh; - if (!summary->saw_snacks) + if (call->acks_nr_snacks == 0) goto resume_normality; } goto out; @@ -149,8 +148,6 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, if (call->cong_cwnd >= RXRPC_TX_MAX_WINDOW) call->cong_cwnd = RXRPC_TX_MAX_WINDOW; trace_rxrpc_congest(call, summary, acked_serial); - if (resend) - rxrpc_resend(call, skb); return; packet_loss_detected: @@ -212,6 +209,13 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, trace_rxrpc_tx_rotate(call, seq, to); trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate); + if (call->acks_lowest_nak == call->tx_bottom) { + call->acks_lowest_nak = to; + } else if (after(to, call->acks_lowest_nak)) { + summary->new_low_snack = true; + call->acks_lowest_nak = to; + } + /* We may have a left over fully-consumed buffer at the front that we * couldn't drop before (rotate_and_keep below). */ @@ -231,6 +235,25 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, set_bit(RXRPC_CALL_TX_LAST, &call->flags); rot_last = true; } + + if (ix == tq->nr_reported_acks) { + /* Packet directly hard ACK'd. */ + tq->nr_reported_acks++; + summary->nr_new_hacks++; + __set_bit(ix, &tq->segment_acked); + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_hack); + } else if (test_bit(ix, &tq->segment_acked)) { + /* Soft ACK -> hard ACK. */ + call->acks_nr_sacks--; + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_sack); + } else { + /* Soft NAK -> hard ACK. */ + call->acks_nr_snacks--; + summary->nr_new_hacks++; + __set_bit(ix, &tq->segment_acked); + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_snak); + } + rxrpc_put_txbuf(tq->bufs[ix], rxrpc_txbuf_put_rotated); tq->bufs[ix] = NULL; @@ -268,13 +291,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, _debug("%x,%x,%x,%d", to, call->tx_bottom, call->tx_top, rot_last); - if (call->acks_lowest_nak == call->tx_bottom) { - call->acks_lowest_nak = to; - } else if (after(to, call->acks_lowest_nak)) { - summary->new_low_snack = true; - call->acks_lowest_nak = to; - } - wake_up(&call->waitq); return rot_last; } @@ -293,11 +309,6 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, call->resend_at = KTIME_MAX; trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); - if (unlikely(call->cong_last_nack)) { - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); - call->cong_last_nack = NULL; - } - switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: case RXRPC_CALL_CLIENT_AWAIT_REPLY: @@ -770,40 +781,92 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb wake_up(&call->waitq); } +#if defined(CONFIG_X86) && __GNUC__ && !defined(__clang__) +/* Clang doesn't support the %z constraint modifier */ +#define shiftr_adv_rotr(shift_from, rotate_into) ({ \ + asm(" shr%z1 %1\n" \ + " inc %0\n" \ + " rcr%z2 %2\n" \ + : "+d"(shift_from), "+m"(*(shift_from)), "+rm"(rotate_into) \ + ); \ + }) +#else +#define shiftr_adv_rotr(shift_from, rotate_into) ({ \ + typeof(rotate_into) __bit0 = *(shift_from) & 1; \ + *(shift_from) >>= 1; \ + shift_from++; \ + rotate_into >>= 1; \ + rotate_into |= __bit0 << (sizeof(rotate_into) * 8 - 1); \ + }) +#endif + /* - * Determine how many nacks from the previous ACK have now been satisfied. + * Process a batch of soft ACKs specific to a transmission queue segment. */ -static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, - struct rxrpc_ack_summary *summary, - rxrpc_seq_t hard_ack) +static void rxrpc_input_soft_ack_tq(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned long extracted_acks, + int nr_reported, + rxrpc_seq_t seq, + rxrpc_seq_t *lowest_nak) { - struct sk_buff *skb = call->cong_last_nack; - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - unsigned int i, new_acks = 0, retained_nacks = 0; - rxrpc_seq_t seq = hard_ack + 1, old_seq = sp->ack.first_ack; - u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); + unsigned long old_reported, flipped, new_acks, a_to_n, n_to_a; + int new, a, n; + + old_reported = ~0UL >> (RXRPC_NR_TXQUEUE - tq->nr_reported_acks); + _enter("{%x,%lx,%d},%lx,%d,%x", + tq->qbase, tq->segment_acked, tq->nr_reported_acks, + extracted_acks, nr_reported, seq); + + _debug("[%x]", tq->qbase); + _debug("tq %16lx %u", tq->segment_acked, tq->nr_reported_acks); + _debug("sack %16lx %u", extracted_acks, nr_reported); + + /* See how many previously logged ACKs/NAKs have flipped. */ + flipped = (tq->segment_acked ^ extracted_acks) & old_reported; + if (flipped) { + n_to_a = ~tq->segment_acked & flipped; /* Old NAK -> ACK */ + a_to_n = tq->segment_acked & flipped; /* Old ACK -> NAK */ + a = hweight_long(n_to_a); + n = hweight_long(a_to_n); + _debug("flip %16lx", flipped); + _debug("ntoa %16lx %d", n_to_a, a); + _debug("aton %16lx %d", a_to_n, n); + call->acks_nr_sacks += a - n; + call->acks_nr_snacks += n - a; + summary->nr_new_sacks += a; + summary->nr_new_snacks += n; + } - if (after_eq(seq, old_seq + sp->ack.nr_acks)) { - summary->nr_new_sacks += sp->ack.nr_nacks; - summary->nr_new_sacks += seq - (old_seq + sp->ack.nr_acks); - summary->nr_retained_snacks = 0; - } else if (seq == old_seq) { - summary->nr_retained_snacks = sp->ack.nr_nacks; - } else { - for (i = 0; i < sp->ack.nr_acks; i++) { - if (acks[i] == RXRPC_ACK_TYPE_NACK) { - if (before(old_seq + i, seq)) - new_acks++; - else - retained_nacks++; - } + /* See how many new ACKs/NAKs have been acquired. */ + new = nr_reported - tq->nr_reported_acks; + if (new > 0) { + new_acks = extracted_acks & ~old_reported; + if (new_acks) { + a = hweight_long(new_acks); + n = new - a; + _debug("new_a %16lx new=%d a=%d n=%d", new_acks, new, a, n); + call->acks_nr_sacks += a; + call->acks_nr_snacks += n; + summary->nr_new_sacks += a; + summary->nr_new_snacks += n; + } else { + call->acks_nr_snacks += new; + summary->nr_new_snacks += new; } - - summary->nr_new_sacks += new_acks; - summary->nr_retained_snacks = retained_nacks; } - return old_seq + sp->ack.nr_acks - 1; + tq->nr_reported_acks = nr_reported; + tq->segment_acked = extracted_acks; + trace_rxrpc_apply_acks(call, tq); + + if (extracted_acks != ~0UL) { + rxrpc_seq_t lowest = seq + ffz(extracted_acks); + + if (before(lowest, *lowest_nak)) + *lowest_nak = lowest; + } } /* @@ -817,39 +880,50 @@ static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, */ static void rxrpc_input_soft_acks(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, - struct sk_buff *skb, - rxrpc_seq_t since) + struct sk_buff *skb) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - unsigned int i, old_nacks = 0; - rxrpc_seq_t lowest_nak = call->acks_hard_ack + sp->ack.nr_acks + 1; - rxrpc_seq_t seq = call->acks_hard_ack; + struct rxrpc_txqueue *tq = call->tx_queue; + unsigned long extracted = ~0UL; + unsigned int nr = 0; + rxrpc_seq_t seq = call->acks_hard_ack + 1; + rxrpc_seq_t lowest_nak = seq + sp->ack.nr_acks; u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); - for (i = 0; i < sp->ack.nr_acks; i++) { - seq++; - if (acks[i] == RXRPC_ACK_TYPE_ACK) { - call->acks_nr_sacks++; - if (after(seq, since)) - summary->nr_new_sacks++; - } else { - summary->saw_snacks = true; - if (before_eq(seq, since)) { - /* Overlap with previous ACK */ - old_nacks++; - } else { - summary->nr_new_snacks++; - sp->ack.nr_nacks++; - } + _enter("%x,%x,%u", tq->qbase, seq, sp->ack.nr_acks); + + while (after(seq, tq->qbase + RXRPC_NR_TXQUEUE - 1)) + tq = tq->next; - if (before(seq, lowest_nak)) - lowest_nak = seq; + for (unsigned int i = 0; i < sp->ack.nr_acks; i++) { + /* Decant ACKs until we hit a txqueue boundary. */ + shiftr_adv_rotr(acks, extracted); + if (i == 256) { + acks -= i; + i = 0; } + seq++; + nr++; + if ((seq & RXRPC_TXQ_MASK) != 0) + continue; + + _debug("bound %16lx %u", extracted, nr); + + rxrpc_input_soft_ack_tq(call, summary, tq, extracted, RXRPC_NR_TXQUEUE, + seq - RXRPC_NR_TXQUEUE, &lowest_nak); + extracted = ~0UL; + nr = 0; + tq = tq->next; + prefetch(tq); } - if (lowest_nak != call->acks_lowest_nak) { - call->acks_lowest_nak = lowest_nak; - summary->new_low_snack = true; + if (nr) { + unsigned int nr_reported = seq & RXRPC_TXQ_MASK; + + extracted >>= RXRPC_NR_TXQUEUE - nr_reported; + _debug("tail %16lx %u", extracted, nr_reported); + rxrpc_input_soft_ack_tq(call, summary, tq, extracted, nr_reported, + seq & ~RXRPC_TXQ_MASK, &lowest_nak); } /* We *can* have more nacks than we did - the peer is permitted to drop @@ -857,9 +931,14 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, * possible for the nack distribution to change whilst the number of * nacks stays the same or goes down. */ - if (old_nacks < summary->nr_retained_snacks) - summary->nr_new_sacks += summary->nr_retained_snacks - old_nacks; - summary->nr_retained_snacks = old_nacks; + if (lowest_nak != call->acks_lowest_nak) { + call->acks_lowest_nak = lowest_nak; + summary->new_low_snack = true; + } + + _debug("summary A=%d+%d N=%d+%d", + call->acks_nr_sacks, summary->nr_new_sacks, + call->acks_nr_snacks, summary->nr_new_snacks); } /* @@ -902,7 +981,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) struct rxrpc_acktrailer trailer; struct rxrpc_skb_priv *sp = rxrpc_skb(skb); rxrpc_serial_t ack_serial, acked_serial; - rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since; + rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; int nr_acks, offset, ioffset; _enter(""); @@ -920,6 +999,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) trace_rxrpc_rx_ack(call, sp); rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); + prefetch(call->tx_queue); if (acked_serial != 0) { switch (summary.ack_reason) { @@ -980,16 +1060,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) skb_condense(skb); - if (call->cong_last_nack) { - since = rxrpc_input_check_prev_ack(call, &summary, hard_ack); - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); - call->cong_last_nack = NULL; - } else { - summary.nr_new_sacks = hard_ack - call->acks_hard_ack; - call->acks_lowest_nak = hard_ack + nr_acks; - since = hard_ack; - } - call->acks_latest_ts = skb->tstamp; call->acks_hard_ack = hard_ack; call->acks_prev_seq = prev_pkt; @@ -1037,9 +1107,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) { if (offset > (int)skb->len - nr_acks) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack); - rxrpc_input_soft_acks(call, &summary, skb, since); - rxrpc_get_skb(skb, rxrpc_skb_get_last_nack); - call->cong_last_nack = skb; + rxrpc_input_soft_acks(call, &summary, skb); } if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) && @@ -1049,6 +1117,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_propose_ack_ping_for_lost_reply); rxrpc_congestion_management(call, skb, &summary, acked_serial); + if (summary.need_retransmit) + rxrpc_resend(call, ack_serial, summary.ack_reason == RXRPC_ACK_PING_RESPONSE); send_response: if (summary.ack_reason == RXRPC_ACK_PING) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 7ed928b6f0e1..978c2dc6a7d4 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -461,7 +461,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, len += sizeof(*jumbo); } - trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, false); + trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags | flags, false); kv->iov_len = len; return len; } @@ -522,6 +522,13 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se } /* Set timeouts */ + if (call->peer->rtt_count > 1) { + ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + + call->ack_lost_at = ktime_add(req->now, delay); + trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); + } + if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) { ktime_t delay = ms_to_ktime(READ_ONCE(call->next_rx_timo)); @@ -596,6 +603,7 @@ void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req ret = 0; trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags, true); + conn->peer->last_tx_at = ktime_get_seconds(); goto done; } } diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index dfbf9f4b24b6..381b25597f4e 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -299,6 +299,9 @@ static int rxrpc_alloc_txqueue(struct sock *sk, struct rxrpc_call *call) kfree(tq); return -ENOMEM; } else { + /* We start at seq 1, so pretend seq 0 is hard-acked. */ + tq->nr_reported_acks = 1; + tq->segment_acked = 1UL; tq->qbase = 0; call->tx_qbase = 0; call->send_queue = tq; From patchwork Wed Dec 4 07:46:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893344 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB93518FC89 for ; Wed, 4 Dec 2024 07:49:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298546; cv=none; b=rUDekS5ktY8nAD6NuVnaLHGgUotBJZGQy3gLCznxfGBvKURIoXqOjXaRcqeq7n8lzL2Ty/gClbko+HX8vMOUYhDtpRthIt5ZrnsdOvOnu2D2bpqBthomcOtaspJbvOMhb6V8vUvJdg2i4/FCyzvqDHDJ7ZwytKfTc7hE7MBBprg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298546; c=relaxed/simple; bh=GD5jTMhrtaGBd/aku5n/L9++yBnH57+wd2EOqnC6zgQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k6amUPgjxm1fPLvaMQRKoW/Ba6e93h1uEWmgM/mG67otJJeRWlINn5+XjeR3f4JGPhds2N+Yw3//zW7tSWmGgOj+Ws/E5rjiWpWTOyr/XSYNs139RDEGxDaTAEHe0JUCd2B+p/6o1BYxXTk1i7f/rJpt4HwrojhUr7h/zXwpRPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VrAhRi7n; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VrAhRi7n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298541; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PnaXIln9HzZQD7hPCE9u7EHNS/SgAIV9ZCArwwn7fq4=; b=VrAhRi7n20R6A3mukT0qJrs+NmNlNMbORTxTuU0OUDs90NohqVvmmj9qLlOWeB/9NXbm9B 5dEqxrU4g+Wn1z//x4L0nUvdyI1rhofg6gZmYranHqb+naKeNthbyjfEz0TJumz9qN972B HEOqNOeAAZme7PiBwc4phgdSyo4wTms= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-182-KthComYaPASpKlH4Sm8zvg-1; Wed, 04 Dec 2024 02:48:58 -0500 X-MC-Unique: KthComYaPASpKlH4Sm8zvg-1 X-Mimecast-MFC-AGG-ID: KthComYaPASpKlH4Sm8zvg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 54D001954ADC; Wed, 4 Dec 2024 07:48:57 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C6EDE1956089; Wed, 4 Dec 2024 07:48:54 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 24/39] rxrpc: Store the DATA serial in the txqueue and use this in RTT calc Date: Wed, 4 Dec 2024 07:46:52 +0000 Message-ID: <20241204074710.990092-25-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Store the serial number set on a DATA packet at the point of transmission in the rxrpc_txqueue struct and when an ACK is received, match the reference number in the ACK by trawling the txqueue rather than sharing an RTT table with ACK RTT. This can be done as part of Tx queue rotation. This means we have a lot more RTT samples available and is faster to search with all the serial numbers packed together into a few cachelines rather than being hung off different txbufs. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 14 ++---- net/rxrpc/ar-internal.h | 4 ++ net/rxrpc/call_event.c | 8 +-- net/rxrpc/input.c | 94 +++++++++++++++++++++++------------- net/rxrpc/output.c | 6 ++- 5 files changed, 79 insertions(+), 47 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 609522a5bd0f..798bea0853c4 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -337,11 +337,10 @@ E_(rxrpc_rtt_tx_ping, "PING") #define rxrpc_rtt_rx_traces \ - EM(rxrpc_rtt_rx_other_ack, "OACK") \ + EM(rxrpc_rtt_rx_data_ack, "DACK") \ EM(rxrpc_rtt_rx_obsolete, "OBSL") \ EM(rxrpc_rtt_rx_lost, "LOST") \ - EM(rxrpc_rtt_rx_ping_response, "PONG") \ - E_(rxrpc_rtt_rx_requested_ack, "RACK") + E_(rxrpc_rtt_rx_ping_response, "PONG") #define rxrpc_timer_traces \ EM(rxrpc_timer_trace_delayed_ack, "DelayAck ") \ @@ -1695,10 +1694,9 @@ TRACE_EVENT(rxrpc_retransmit, ); TRACE_EVENT(rxrpc_congest, - TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, - rxrpc_serial_t ack_serial), + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary), - TP_ARGS(call, summary, ack_serial), + TP_ARGS(call, summary), TP_STRUCT__entry( __field(unsigned int, call) @@ -1706,7 +1704,6 @@ TRACE_EVENT(rxrpc_congest, __field(rxrpc_seq_t, hard_ack) __field(rxrpc_seq_t, top) __field(rxrpc_seq_t, lowest_nak) - __field(rxrpc_serial_t, ack_serial) __field(u16, nr_sacks) __field(u16, nr_snacks) __field(u16, cwnd) @@ -1722,7 +1719,6 @@ TRACE_EVENT(rxrpc_congest, __entry->hard_ack = call->acks_hard_ack; __entry->top = call->tx_top; __entry->lowest_nak = call->acks_lowest_nak; - __entry->ack_serial = ack_serial; __entry->nr_sacks = call->acks_nr_sacks; __entry->nr_snacks = call->acks_nr_snacks; __entry->cwnd = call->cong_cwnd; @@ -1734,7 +1730,7 @@ TRACE_EVENT(rxrpc_congest, TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u A=%u+%u/%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s", __entry->call, - __entry->ack_serial, + __entry->sum.acked_serial, __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names), __entry->hard_ack, __print_symbolic(__entry->ca_state, rxrpc_ca_states), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 9a70f0b86570..297be421639c 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -769,6 +769,7 @@ struct rxrpc_call { * Summary of a new ACK and the changes it made to the Tx buffer packet states. */ struct rxrpc_ack_summary { + rxrpc_serial_t acked_serial; /* Serial number ACK'd */ u16 in_flight; /* Number of unreceived transmissions */ u16 nr_new_hacks; /* Number of rotated new ACKs */ u16 nr_new_sacks; /* Number of new soft ACKs in packet */ @@ -777,6 +778,7 @@ struct rxrpc_ack_summary { bool new_low_snack:1; /* T if new low soft NACK found */ bool retrans_timeo:1; /* T if reTx due to timeout happened */ bool need_retransmit:1; /* T if we need transmission */ + bool rtt_sample_avail:1; /* T if RTT sample available */ u8 /*enum rxrpc_congest_change*/ change; }; @@ -859,12 +861,14 @@ struct rxrpc_txqueue { unsigned long segment_acked; /* Bit-per-buf: Set if ACK'd */ unsigned long segment_lost; /* Bit-per-buf: Set if declared lost */ unsigned long segment_retransmitted; /* Bit-per-buf: Set if retransmitted */ + unsigned long rtt_samples; /* Bit-per-buf: Set if available for RTT */ /* The arrays we want to pack into as few cache lines as possible. */ struct { #define RXRPC_NR_TXQUEUE BITS_PER_LONG #define RXRPC_TXQ_MASK (RXRPC_NR_TXQUEUE - 1) struct rxrpc_txbuf *bufs[RXRPC_NR_TXQUEUE]; + unsigned int segment_serial[RXRPC_NR_TXQUEUE]; unsigned int segment_xmit_ts[RXRPC_NR_TXQUEUE]; } ____cacheline_aligned; }; diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index e25921d39d4d..f71773b18e22 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -159,11 +159,11 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ rxrpc_seq_t stop = earliest(tq_top, call->tx_transmitted); _debug("unrep %x-%x", start, stop); - for (rxrpc_seq_t seq = start; before(seq, stop); seq++) { - struct rxrpc_txbuf *txb = tq->bufs[seq & RXRPC_TXQ_MASK]; + for (rxrpc_seq_t seq = start; before_eq(seq, stop); seq++) { + rxrpc_serial_t serial = tq->segment_serial[seq & RXRPC_TXQ_MASK]; if (ping_response && - before(txb->serial, call->acks_highest_serial)) + before(serial, call->acks_highest_serial)) break; /* Wasn't accounted for by a more recent ping. */ req.tq = tq; req.seq = seq; @@ -198,7 +198,7 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ _debug("delay %llu %lld", delay, ktime_sub(resend_at, req.now)); call->resend_at = resend_at; - trace_rxrpc_timer_set(call, resend_at - req.now, + trace_rxrpc_timer_set(call, ktime_sub(resend_at, req.now), rxrpc_timer_trace_resend_reset); } else { call->resend_at = KTIME_MAX; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 6e7ff133b5aa..41b4fb56f96c 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -30,9 +30,7 @@ static void rxrpc_proto_abort(struct rxrpc_call *call, rxrpc_seq_t seq, * Do TCP-style congestion management [RFC 5681]. */ static void rxrpc_congestion_management(struct rxrpc_call *call, - struct sk_buff *skb, - struct rxrpc_ack_summary *summary, - rxrpc_serial_t acked_serial) + struct rxrpc_ack_summary *summary) { summary->change = rxrpc_cong_no_change; summary->in_flight = (call->tx_top - call->tx_bottom) - call->acks_nr_sacks; @@ -44,7 +42,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, if (call->cong_cwnd >= call->cong_ssthresh && call->cong_ca_state == RXRPC_CA_SLOW_START) { call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; - call->cong_tstamp = skb->tstamp; + call->cong_tstamp = call->acks_latest_ts; call->cong_cumul_acks = 0; } } @@ -62,7 +60,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, call->cong_cwnd += 1; if (call->cong_cwnd >= call->cong_ssthresh) { call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; - call->cong_tstamp = skb->tstamp; + call->cong_tstamp = call->acks_latest_ts; } goto out; @@ -75,12 +73,12 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, */ if (call->peer->rtt_count == 0) goto out; - if (ktime_before(skb->tstamp, + if (ktime_before(call->acks_latest_ts, ktime_add_us(call->cong_tstamp, call->peer->srtt_us >> 3))) goto out_no_clear_ca; summary->change = rxrpc_cong_rtt_window_end; - call->cong_tstamp = skb->tstamp; + call->cong_tstamp = call->acks_latest_ts; if (call->cong_cumul_acks >= call->cong_cwnd) call->cong_cwnd++; goto out; @@ -137,7 +135,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, summary->change = rxrpc_cong_cleared_nacks; call->cong_dup_acks = 0; call->cong_extra = 0; - call->cong_tstamp = skb->tstamp; + call->cong_tstamp = call->acks_latest_ts; if (call->cong_cwnd < call->cong_ssthresh) call->cong_ca_state = RXRPC_CA_SLOW_START; else @@ -147,7 +145,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, out_no_clear_ca: if (call->cong_cwnd >= RXRPC_TX_MAX_WINDOW) call->cong_cwnd = RXRPC_TX_MAX_WINDOW; - trace_rxrpc_congest(call, summary, acked_serial); + trace_rxrpc_congest(call, summary); return; packet_loss_detected: @@ -194,11 +192,29 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) call->cong_cwnd = umax(call->cong_cwnd / 2, RXRPC_MIN_CWND); } +/* + * Add an RTT sample derived from an ACK'd DATA packet. + */ +static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + int ix, + rxrpc_serial_t ack_serial) +{ + rxrpc_peer_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, + summary->acked_serial, ack_serial, + ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]), + call->acks_latest_ts); + summary->rtt_sample_avail = false; + __clear_bit(ix, &tq->rtt_samples); /* Prevent repeat RTT sample */ +} + /* * Apply a hard ACK by advancing the Tx window. */ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, - struct rxrpc_ack_summary *summary) + struct rxrpc_ack_summary *summary, + rxrpc_serial_t ack_serial) { struct rxrpc_txqueue *tq = call->tx_queue; rxrpc_seq_t seq = call->tx_bottom + 1; @@ -236,6 +252,11 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, rot_last = true; } + if (summary->rtt_sample_avail && + summary->acked_serial == tq->segment_serial[ix] && + test_bit(ix, &tq->rtt_samples)) + rxrpc_add_data_rtt_sample(call, summary, tq, ix, ack_serial); + if (ix == tq->nr_reported_acks) { /* Packet directly hard ACK'd. */ tq->nr_reported_acks++; @@ -348,7 +369,7 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call) } if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) { - if (!rxrpc_rotate_tx_window(call, top, &summary)) { + if (!rxrpc_rotate_tx_window(call, top, &summary, 0)) { rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply); return false; } @@ -800,6 +821,19 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb }) #endif +/* + * Deal with RTT samples from soft ACKs. + */ +static void rxrpc_input_soft_rtt(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + rxrpc_serial_t ack_serial) +{ + for (int ix = 0; ix < RXRPC_NR_TXQUEUE; ix++) + if (summary->acked_serial == tq->segment_serial[ix]) + return rxrpc_add_data_rtt_sample(call, summary, tq, ix, ack_serial); +} + /* * Process a batch of soft ACKs specific to a transmission queue segment. */ @@ -909,6 +943,8 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, _debug("bound %16lx %u", extracted, nr); + if (summary->rtt_sample_avail) + rxrpc_input_soft_rtt(call, summary, tq, sp->hdr.serial); rxrpc_input_soft_ack_tq(call, summary, tq, extracted, RXRPC_NR_TXQUEUE, seq - RXRPC_NR_TXQUEUE, &lowest_nak); extracted = ~0UL; @@ -980,7 +1016,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) struct rxrpc_ack_summary summary = { 0 }; struct rxrpc_acktrailer trailer; struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - rxrpc_serial_t ack_serial, acked_serial; + rxrpc_serial_t ack_serial; rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; int nr_acks, offset, ioffset; @@ -989,11 +1025,11 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) offset = sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); ack_serial = sp->hdr.serial; - acked_serial = sp->ack.acked_serial; first_soft_ack = sp->ack.first_ack; prev_pkt = sp->ack.prev_ack; nr_acks = sp->ack.nr_acks; hard_ack = first_soft_ack - 1; + summary.acked_serial = sp->ack.acked_serial; summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? sp->ack.reason : RXRPC_ACK__INVALID); @@ -1001,21 +1037,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); prefetch(call->tx_queue); - if (acked_serial != 0) { - switch (summary.ack_reason) { - case RXRPC_ACK_PING_RESPONSE: - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, - rxrpc_rtt_rx_ping_response); - break; - case RXRPC_ACK_REQUESTED: - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, - rxrpc_rtt_rx_requested_ack); - break; - default: - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, - rxrpc_rtt_rx_other_ack); - break; - } + if (summary.acked_serial != 0) { + if (summary.ack_reason == RXRPC_ACK_PING_RESPONSE) + rxrpc_complete_rtt_probe(call, skb->tstamp, summary.acked_serial, + ack_serial, rxrpc_rtt_rx_ping_response); + else + summary.rtt_sample_avail = true; } /* If we get an EXCEEDS_WINDOW ACK from the server, it probably @@ -1068,8 +1095,9 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) case RXRPC_ACK_PING: break; default: - if (acked_serial && after(acked_serial, call->acks_highest_serial)) - call->acks_highest_serial = acked_serial; + if (summary.acked_serial && + after(summary.acked_serial, call->acks_highest_serial)) + call->acks_highest_serial = summary.acked_serial; break; } @@ -1098,7 +1126,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_sack_overflow); if (after(hard_ack, call->tx_bottom)) { - if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { + if (rxrpc_rotate_tx_window(call, hard_ack, &summary, ack_serial)) { rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ack); goto send_response; } @@ -1116,7 +1144,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_propose_ping(call, ack_serial, rxrpc_propose_ack_ping_for_lost_reply); - rxrpc_congestion_management(call, skb, &summary, acked_serial); + rxrpc_congestion_management(call, &summary); if (summary.need_retransmit) rxrpc_resend(call, ack_serial, summary.ack_reason == RXRPC_ACK_PING_RESPONSE); @@ -1136,7 +1164,7 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb) { struct rxrpc_ack_summary summary = { 0 }; - if (rxrpc_rotate_tx_window(call, call->tx_top, &summary)) + if (rxrpc_rotate_tx_window(call, call->tx_top, &summary, 0)) rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall); } diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 978c2dc6a7d4..20bf45317264 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -436,7 +436,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, trace_rxrpc_req_ack(call->debug_id, txb->seq, why); if (why != rxrpc_reqack_no_srv_last) { flags |= RXRPC_REQUEST_ACK; - rxrpc_begin_rtt_probe(call, serial, req->now, rxrpc_rtt_tx_data); + trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_data, -1, serial); call->peer->rtt_last_req = req->now; } dont_set_request_ack: @@ -508,6 +508,10 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se _debug("prep[%u] tq=%x q=%x", i, tq->qbase, seq); tq->segment_xmit_ts[ix] = xmit_ts; + tq->segment_serial[ix] = serial; + if (i + 1 == req->n) + /* Only sample the last subpacket in a jumbo. */ + __set_bit(ix, &tq->rtt_samples); len += rxrpc_prepare_data_subpacket(call, req, txb, serial, i); serial++; seq++; From patchwork Wed Dec 4 07:46:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893343 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B9FE1B6D14 for ; Wed, 4 Dec 2024 07:49:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298546; cv=none; b=ty6gDPID7vAvNwvxLMDDXh4kS0VPayknOMULXAJcjvb4ZoaY4FvpxMjok1jPEbOABK+OUQ1AfcoKVgDMOwzLQ6R7kXktKq4b1YGEUw8XJqAaHy9gx3MJAkck3wduX/unRqQrxP8zb0kZTGg3ssUN0MVu0jRHGZK0SFXXZlspi44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298546; c=relaxed/simple; bh=SUSeeCpo3fJtPlSnWodgXvhrV0YL7exdUNFHf+eIFJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mro3ke0iNhQoXNb+V3pyozvzAedAznWbOygI2IGC+lS+5UkqzjcG9MrGkArEUdjlA95PQLB7VaMUChyyZbKxUYRZpnF585kY8Ch2YRh2LTwHS0CHgXcITA+OtYW3GXwZwAgk5BJIvy3wfLZ9ql7B7GiR+QifC6i9BJtY5SNebMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DaTiV+G9; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DaTiV+G9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tYNqwQ+r9bHBO6fnf+nYpMYD43+gr4F/tj7hJd7FZHo=; b=DaTiV+G9Pr0YB9vQmeAjy0KRp1Il9RW7mGoInwSTeqc9kThh1tofFrQ5jQxreaWQqcEFZu A/04giuhcT5QRhz1s/uoOejPqHwV5fXzTgHYYAjhB3TcALAqKJ37CuivsRtMhd1GTAt2ca mOpnalEeXEI9F5fDQ6Xg55ygH3eiUEI= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-217-wAQLODvDNGmP7N5RHjQ3vw-1; Wed, 04 Dec 2024 02:49:02 -0500 X-MC-Unique: wAQLODvDNGmP7N5RHjQ3vw-1 X-Mimecast-MFC-AGG-ID: wAQLODvDNGmP7N5RHjQ3vw Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 331661955DD8; Wed, 4 Dec 2024 07:49:01 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A77E519560A2; Wed, 4 Dec 2024 07:48:58 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 25/39] rxrpc: Don't use received skbuff timestamps Date: Wed, 4 Dec 2024 07:46:53 +0000 Message-ID: <20241204074710.990092-26-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Patchwork-Delegate: kuba@kernel.org Don't use received skbuff timestamps, but rather set a timestamp when an ack is processed so that the time taken to get to rxrpc_input_ack() is included in the RTT. The timestamp of the latest ACK received is tracked in call->acks_latest_ts. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/input.c | 19 ++++++++++--------- net/rxrpc/local_object.c | 3 --- 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 41b4fb56f96c..c682e95e15dc 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1037,14 +1037,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); prefetch(call->tx_queue); - if (summary.acked_serial != 0) { - if (summary.ack_reason == RXRPC_ACK_PING_RESPONSE) - rxrpc_complete_rtt_probe(call, skb->tstamp, summary.acked_serial, - ack_serial, rxrpc_rtt_rx_ping_response); - else - summary.rtt_sample_avail = true; - } - /* If we get an EXCEEDS_WINDOW ACK from the server, it probably * indicates that the client address changed due to NAT. The server * lost the call because it switched to a different peer. @@ -1087,7 +1079,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) skb_condense(skb); - call->acks_latest_ts = skb->tstamp; + call->acks_latest_ts = ktime_get_real(); call->acks_hard_ack = hard_ack; call->acks_prev_seq = prev_pkt; @@ -1108,6 +1100,15 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (hard_ack + 1 == 0) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); + if (summary.acked_serial != 0) { + if (summary.ack_reason == RXRPC_ACK_PING_RESPONSE) + rxrpc_complete_rtt_probe(call, call->acks_latest_ts, + summary.acked_serial, ack_serial, + rxrpc_rtt_rx_ping_response); + else + summary.rtt_sample_avail = true; + } + /* Ignore ACKs unless we are or have just been transmitting. */ switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 2792d2304605..a74a4b43904f 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -215,9 +215,6 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) /* we want to set the don't fragment bit */ rxrpc_local_dont_fragment(local, true); - - /* We want receive timestamps. */ - sock_enable_timestamps(usk); break; default: From patchwork Wed Dec 4 07:46:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893345 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 305321BF804 for ; Wed, 4 Dec 2024 07:49:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298555; cv=none; b=UTWqGZXCF/VzKDml+vv4iqdI28H8EUIyyzuPVYyLIhHlWxJskoo0X9iHS0lhNNTpyIv+PDNBTe189xUXtNNqI6J1ny8X1Eqyui8InvedL5Rn50pmLlAP23axOKdqTCTw6g8yq9PxDfjvmmnPOfRcymvl2p2bdoLUYj25wSjjcLc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298555; c=relaxed/simple; bh=saYqm8EOVeANEoTXWTw6WdkYn+balXtt2gaiRxIUU/g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qhJSfvthooqXxxw+sEDzhXm33rOzzPfAEJTsC9psHuPBlovq1yRWF0X+PdVjd2RwwTzLv4xLOWTpg87uk4PmZkxNSHDbKvXpluwtAjEb89NOPEzONHrmjWn3bAFpyzYzi9BbdhSRdkNqPDlv9D5rFSthkTK/tUEwJR86/WPlayw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DOTtUe4B; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DOTtUe4B" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d3beAahzzp/czYsbi3q6+uyRb7JHtNp9XY3LLvT2nIA=; b=DOTtUe4BWcOsOO38R/0nkQwsawnIcMckw01Bc+BHQkT52BK8OJjEWZ9c5kypLVaw5QZVZb THTS1PfsjLJUDNwPz4UI7iQ9NxsPDZJ0xEizdr9KKJfeJ+Lb0DASCrI0gDlc/WFtu+CDc7 WVh6Qr1SjlbjjWVBorkLpKrc97Po66c= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-339-ldnLZdl8MGOxcgq1vqBU_w-1; Wed, 04 Dec 2024 02:49:06 -0500 X-MC-Unique: ldnLZdl8MGOxcgq1vqBU_w-1 X-Mimecast-MFC-AGG-ID: ldnLZdl8MGOxcgq1vqBU_w Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 50F7A1955F28; Wed, 4 Dec 2024 07:49:05 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A166F3000197; Wed, 4 Dec 2024 07:49:02 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 26/39] rxrpc: Generate rtt_min Date: Wed, 4 Dec 2024 07:46:54 +0000 Message-ID: <20241204074710.990092-27-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Generate rtt_min as this is required by RACK-TLP. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- lib/win_minmax.c | 1 + net/rxrpc/ar-internal.h | 2 ++ net/rxrpc/rtt.c | 20 ++++++++++++++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/lib/win_minmax.c b/lib/win_minmax.c index ec10506834b6..1682e614309c 100644 --- a/lib/win_minmax.c +++ b/lib/win_minmax.c @@ -97,3 +97,4 @@ u32 minmax_running_min(struct minmax *m, u32 win, u32 t, u32 meas) return minmax_subwin_update(m, win, &val); } +EXPORT_SYMBOL(minmax_running_min); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 297be421639c..d0d0ab453909 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -371,6 +371,8 @@ struct rxrpc_peer { spinlock_t rtt_input_lock; /* RTT lock for input routine */ ktime_t rtt_last_req; /* Time of last RTT request */ unsigned int rtt_count; /* Number of samples we've got */ + unsigned int rtt_taken; /* Number of samples taken (wrapping) */ + struct minmax min_rtt; /* Estimated minimum RTT */ u32 srtt_us; /* smoothed round trip time << 3 in usecs */ u32 mdev_us; /* medium deviation */ diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index 6dc51486b5a6..8048467f4bee 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -127,16 +127,27 @@ static void rxrpc_set_rto(struct rxrpc_peer *peer) peer->rto_us = rxrpc_bound_rto(rto); } -static void rxrpc_ack_update_rtt(struct rxrpc_peer *peer, long rtt_us) +static void rxrpc_update_rtt_min(struct rxrpc_peer *peer, ktime_t resp_time, long rtt_us) +{ + /* Window size 5mins in approx usec (ipv4.sysctl_tcp_min_rtt_wlen) */ + u32 wlen_us = 5ULL * NSEC_PER_SEC / 1024; + + minmax_running_min(&peer->min_rtt, wlen_us, resp_time / 1024, + (u32)rtt_us ? : jiffies_to_usecs(1)); +} + +static void rxrpc_ack_update_rtt(struct rxrpc_peer *peer, ktime_t resp_time, long rtt_us) { if (rtt_us < 0) return; - //rxrpc_update_rtt_min(peer, rtt_us); + /* Update RACK min RTT [RFC8985 6.1 Step 1]. */ + rxrpc_update_rtt_min(peer, resp_time, rtt_us); + rxrpc_rtt_estimator(peer, rtt_us); rxrpc_set_rto(peer); - /* RFC6298: only reset backoff on valid RTT measurement. */ + /* Only reset backoff on valid RTT measurement [RFC6298]. */ peer->backoff = 0; } @@ -157,9 +168,10 @@ void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, return; spin_lock(&peer->rtt_input_lock); - rxrpc_ack_update_rtt(peer, rtt_us); + rxrpc_ack_update_rtt(peer, resp_time, rtt_us); if (peer->rtt_count < 3) peer->rtt_count++; + peer->rtt_taken++; spin_unlock(&peer->rtt_input_lock); trace_rxrpc_rtt_rx(call, why, rtt_slot, send_serial, resp_serial, From patchwork Wed Dec 4 07:46:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893346 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A97C91C1AA9 for ; Wed, 4 Dec 2024 07:49:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298556; cv=none; b=qLhtmC5lq3K/xnD2iGlzlFe1gEKLRjMehl0+t8Z/wxpT3CzZYLX7N2vTKF5DRD6hKC06TuaQteK1wSQQNlkKEIK9THXhwzbo3yAp8DuUq+QplKxmTE9dAmdh8Rf1mkreuvdKIfut8hFK3dC0moyL154um7asOPbNbuEFSEifsQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298556; c=relaxed/simple; bh=tK9CEy1CIpIjwAX+ilTQ9kE3jrph0RfC2ZC1auwvmx8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WtcoQO3mBdZ9rrFX7kgANUBsbVVFOEm3X8qxVIkdeh8mpQ5BZMTT76js1ie6QL39w4tf0t4l2NMzZyHGmc1OKFqQ3bZoTwjs+YncLkThpd1R3cOG/4QKUdTS74fiLNPSR6HIgBucK9Qwx44+gxFi8b5UVIYjjRSsT4hh7EXZ9/w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SD7+IJ/0; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SD7+IJ/0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298553; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wZrRzk/s9nUon9TCg/3caLMbcRXaPdQa2A2SZg/kig4=; b=SD7+IJ/0/J1++aM6hg2Qa8BUYKIkS1M/AHbfPzrwHFxm6GMbpOx14UnQgS3E50fD6S0CPn 5GLC8svuoGraPS9HHszmWXYChWNILacthI8S4kdQk1qCFnWUS+Wko39LpascguJRbxhRVV 9b/B+2x8yjdt4XOFDxjoLHo6qd2EzBA= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-91-6fQHhzThMHqWEdXxwwsRiw-1; Wed, 04 Dec 2024 02:49:10 -0500 X-MC-Unique: 6fQHhzThMHqWEdXxwwsRiw-1 X-Mimecast-MFC-AGG-ID: 6fQHhzThMHqWEdXxwwsRiw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 63BDE1956088; Wed, 4 Dec 2024 07:49:09 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BE41A3000197; Wed, 4 Dec 2024 07:49:06 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 27/39] rxrpc: Adjust the rxrpc_rtt_rx tracepoint Date: Wed, 4 Dec 2024 07:46:55 +0000 Message-ID: <20241204074710.990092-28-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Adjust the rxrpc_rtt_rx tracepoint in the following ways: (1) Display the collected RTT sample in the rxrpc_rtt_rx trace. (2) Move the division of srtt by 8 to the TP_printk() rather doing it before invoking the trace point. (3) Display the min_rtt value. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 14 ++++++++++---- net/rxrpc/input.c | 4 ++-- net/rxrpc/rtt.c | 2 +- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 798bea0853c4..6e929f4448ac 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -1415,9 +1415,9 @@ TRACE_EVENT(rxrpc_rtt_rx, TP_PROTO(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, int slot, rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, - u32 rtt, u32 rto), + u32 rtt, u32 srtt, u32 rto), - TP_ARGS(call, why, slot, send_serial, resp_serial, rtt, rto), + TP_ARGS(call, why, slot, send_serial, resp_serial, rtt, srtt, rto), TP_STRUCT__entry( __field(unsigned int, call) @@ -1426,7 +1426,9 @@ TRACE_EVENT(rxrpc_rtt_rx, __field(rxrpc_serial_t, send_serial) __field(rxrpc_serial_t, resp_serial) __field(u32, rtt) + __field(u32, srtt) __field(u32, rto) + __field(u32, min_rtt) ), TP_fast_assign( @@ -1436,17 +1438,21 @@ TRACE_EVENT(rxrpc_rtt_rx, __entry->send_serial = send_serial; __entry->resp_serial = resp_serial; __entry->rtt = rtt; + __entry->srtt = srtt; __entry->rto = rto; + __entry->min_rtt = minmax_get(&call->peer->min_rtt) ), - TP_printk("c=%08x [%d] %s sr=%08x rr=%08x rtt=%u rto=%u", + TP_printk("c=%08x [%d] %s sr=%08x rr=%08x rtt=%u srtt=%u rto=%u min=%u", __entry->call, __entry->slot, __print_symbolic(__entry->why, rxrpc_rtt_rx_traces), __entry->send_serial, __entry->resp_serial, __entry->rtt, - __entry->rto) + __entry->srtt / 8, + __entry->rto, + __entry->min_rtt) ); TRACE_EVENT(rxrpc_timer_set, diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index c682e95e15dc..1eb9c22aba51 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -740,7 +740,7 @@ static void rxrpc_complete_rtt_probe(struct rxrpc_call *call, */ if (after(acked_serial, orig_serial)) { trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_obsolete, i, - orig_serial, acked_serial, 0, 0); + orig_serial, acked_serial, 0, 0, 0); clear_bit(i + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); smp_wmb(); set_bit(i, &call->rtt_avail); @@ -748,7 +748,7 @@ static void rxrpc_complete_rtt_probe(struct rxrpc_call *call, } if (!matched) - trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_lost, 9, 0, acked_serial, 0, 0); + trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_lost, 9, 0, acked_serial, 0, 0, 0); } /* diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index 8048467f4bee..e0b7d99854b4 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -175,7 +175,7 @@ void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, spin_unlock(&peer->rtt_input_lock); trace_rxrpc_rtt_rx(call, why, rtt_slot, send_serial, resp_serial, - peer->srtt_us >> 3, peer->rto_us); + rtt_us, peer->srtt_us, peer->rto_us); } /* From patchwork Wed Dec 4 07:46:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893347 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A70B3192D86 for ; Wed, 4 Dec 2024 07:49:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298560; cv=none; b=PfgbPgQj0+DvNyEwecvteOLNXumPMFZx1zJUsYVG/fSp6EII0RfRVjX/lqiTAFYSp2TkVYFV+8eBdNtM1KN2RUghtD2hpymq9LyRpJNhWXz4v6Tx1MCijjwkinzckx6rkC+hUOQ6B6AZDG/2tL1j/Yt5VKKSGOOtcIK0zn6KbZo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298560; c=relaxed/simple; bh=3btpIzUrc/ggkCKYwyGitI0UwmXR12kkUzIboNem7/4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n8/oTyTWheIN2k0mXyWlgqKtS67DaOMEcS94LynnssF5Z8JueTylCslcemkztTcgQoHCwnJLrfQlvQxRPdMHbOdsNRACEOOLn1t0x6bhnivYa+xEFxiA3NiGr+2BVWEGwoeTjwa8Q+FX2aXegt2rnOApTn2uXbshNDOPXnwKNC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dvscBdzs; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dvscBdzs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4k7JdZ9wo1FEl+k5fitGAIvj93CLuEOezWNS09/9ews=; b=dvscBdzsTuiM+myhnP9u5QVi8POY9hzmMMCOI2mhQ9H0GFhCFQclAlz5VUd4lZJ+dWqLaf U7Km7vi8VPhZ7h2hU5TttR03+lyMWAbAaA74XVVr+XTFdMp/mnuEMvCJIkiW4efg+nbJju WZ24OHsiSDDejqGCGLy8XxXaK5Kts80= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-584--tX0nxSnP0WTU0aytHnbQQ-1; Wed, 04 Dec 2024 02:49:14 -0500 X-MC-Unique: -tX0nxSnP0WTU0aytHnbQQ-1 X-Mimecast-MFC-AGG-ID: -tX0nxSnP0WTU0aytHnbQQ Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 339B11955DCA; Wed, 4 Dec 2024 07:49:13 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A93FB1956048; Wed, 4 Dec 2024 07:49:10 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 28/39] rxrpc: Display userStatus in rxrpc_rx_ack trace Date: Wed, 4 Dec 2024 07:46:56 +0000 Message-ID: <20241204074710.990092-29-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Display the userStatus field from the Rx packet header in the rxrpc_rx_ack trace line. This is used for flow control purposes by FS.StoreData-type kafs RPC calls. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 6e929f4448ac..7681c67f7d65 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -1031,11 +1031,13 @@ TRACE_EVENT(rxrpc_rx_ack, __field(rxrpc_seq_t, prev) __field(u8, reason) __field(u8, n_acks) + __field(u8, user_status) ), TP_fast_assign( __entry->call = call->debug_id; __entry->serial = sp->hdr.serial; + __entry->user_status = sp->hdr.userStatus; __entry->ack_serial = sp->ack.acked_serial; __entry->first = sp->ack.first_ack; __entry->prev = sp->ack.prev_ack; @@ -1043,11 +1045,12 @@ TRACE_EVENT(rxrpc_rx_ack, __entry->n_acks = sp->ack.nr_acks; ), - TP_printk("c=%08x %08x %s r=%08x f=%08x p=%08x n=%u", + TP_printk("c=%08x %08x %s r=%08x us=%02x f=%08x p=%08x n=%u", __entry->call, __entry->serial, __print_symbolic(__entry->reason, rxrpc_ack_names), __entry->ack_serial, + __entry->user_status, __entry->first, __entry->prev, __entry->n_acks) From patchwork Wed Dec 4 07:46:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893348 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 338E21DC182 for ; Wed, 4 Dec 2024 07:49:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298567; cv=none; b=rkhnxakEqtQNKVNNola8q7tQfk/2h0ZebtwDd0GMZ/jlfUnpzRSeISLu0XoACp0u97NixUdqdx/n2kBUcqC1XKHvRdeogFWtaBFUrOzH9f/dCoCkCb9E90s+dzI9mlzaE3gmvcTk8xdZ/8NJxjv2eNp3zR97leuDIvdGWiiXUa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298567; c=relaxed/simple; bh=YtjjrJglzNbaWY3oR5O84SLCh/26+LOHepc0+fkoFTo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NdknXg7U7VvEF4v1vwF7VbK4O0yKBcJbxTxdamS8a/TfQJneTm61nJIqa5jnpK9YpxkF/c36pAEkC/vkd+rANxR2jhLHLscYlXvq/hr9aDlMsxeRsiKkJyeQXrxYMKf+/w7b1GP4khAJ2F1ml1KMAlRsEGL3kA/vlw3YT4MF9Lo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=i6ZzN4Di; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="i6ZzN4Di" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rF+8sPKxES4A1AKWrEyikFQcOwxRXaA9QVPagIN61Ew=; b=i6ZzN4DioMyCYAWANKTSQv5CIlxoj0frc2/14ttgp8ZhaAvAL+bEuYkfCAba/jZ72pRHyK J8FtP56505bLY9u6wmwIzrDOY3RSk68xPfyJOZS7IyuWPBpQhz4xiZanEcDrnEjSTdPXrI GtStpaTgxqAeFXAfxL3UVATzfT+cqOA= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-130-yxjoyEIPObS8NWWEPyEmgA-1; Wed, 04 Dec 2024 02:49:19 -0500 X-MC-Unique: yxjoyEIPObS8NWWEPyEmgA-1 X-Mimecast-MFC-AGG-ID: yxjoyEIPObS8NWWEPyEmgA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8F96D1956053; Wed, 4 Dec 2024 07:49:17 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A29BB3000199; Wed, 4 Dec 2024 07:49:14 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org, Simon Wilkinson Subject: [PATCH net-next v2 29/39] rxrpc: Fix the calculation and use of RTO Date: Wed, 4 Dec 2024 07:46:57 +0000 Message-ID: <20241204074710.990092-30-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Make the following changes to the calculation and use of RTO: (1) Fix rxrpc_resend() to use the backed-off RTO value obtained by calling rxrpc_get_rto_backoff() rather than extracting the value itself. Without this, it may retransmit packets too early. (2) The RTO value being similar to the RTT causes a lot of extraneous resends because the RTT doesn't end up taking account of clearing out of the receive queue on the server. Worse, responses to PING-ACKs are made as fast as possible and so are less than the DATA-requested-ACK RTT and so skew the RTT down. Fix this by putting a lower bound on the RTO by adding 100ms to it and limiting the lower end to 200ms. Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Fixes: 37473e416234 ("rxrpc: Clean up the resend algorithm") Signed-off-by: David Howells Suggested-by: Simon Wilkinson cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/call_event.c | 3 ++- net/rxrpc/rtt.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index f71773b18e22..4390c97e3ba6 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -103,7 +103,8 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ .now = ktime_get_real(), }; struct rxrpc_txqueue *tq = call->tx_queue; - ktime_t lowest_xmit_ts = KTIME_MAX, rto = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); + ktime_t lowest_xmit_ts = KTIME_MAX; + ktime_t rto = rxrpc_get_rto_backoff(call->peer, false); bool unacked = false; _enter("{%d,%d}", call->tx_bottom, call->tx_top); diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index e0b7d99854b4..3f1ec8e420a6 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -27,7 +27,7 @@ static u32 __rxrpc_set_rto(const struct rxrpc_peer *peer) static u32 rxrpc_bound_rto(u32 rto) { - return umin(rto, RXRPC_RTO_MAX); + return clamp(200000, rto + 100000, RXRPC_RTO_MAX); } /* From patchwork Wed Dec 4 07:46:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893349 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C20461DF997 for ; Wed, 4 Dec 2024 07:49:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298569; cv=none; b=q01oiRc0mX+yiKSpG3jLrOS5NYLSmlGnMp0eMYvxSAi1kE54cg77M8Fv3XV7W/6v1cgDigU90sYbYci+I/pIe/SxfoAvPrClwbcp8ZcVdksocBlqMClUeHE+jiGTXC1huOeiKIOB2CZCNVF7Pv/NqKMqfem4J+CI1Ch/jWpehbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298569; c=relaxed/simple; bh=q8eI2LFlXh5tUytmJruAwYqFJpkoG3KwZN+jS6Rw+WQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DC+JddJ6AV4M+j1wpP07G915+9qg41j/P9OS3cHl9yINi58lnqKSX0mxnsne+upmRNzgTufGInlntj46vQGm59EhugYTs+EmMNDubo7/gQ63q+EoRflIGKAAXnOQ52IGzqztpJ3dTF1Q3n/3K74yL5a3y3Tu10sbOEi13uVzB/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X6NbV12S; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X6NbV12S" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298567; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oQgmi9sMCMQxv1vrK3Hj0AIG3gww5bOixPaAJ5ZigAk=; b=X6NbV12SZz8rdhU1ojhHeg21IG036+BbeLCxnV9v/3dQN2BG33V4/YYNoYFoz1IngH8VLO jljRK2gLvMmCSH0g8vR8vOzjmopj5bgirBXYnuUQTxx56CTrtUGv9ys4Pnw4jyT9Fkz0ka Bfi+d3bFeOivdNZN261wWPlacqvnI2A= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-272-aZVkxZFGPmSVjevHovESwg-1; Wed, 04 Dec 2024 02:49:23 -0500 X-MC-Unique: aZVkxZFGPmSVjevHovESwg-1 X-Mimecast-MFC-AGG-ID: aZVkxZFGPmSVjevHovESwg Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5791B19560A2; Wed, 4 Dec 2024 07:49:22 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E210E1956048; Wed, 4 Dec 2024 07:49:18 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 30/39] rxrpc: Fix initial resend timeout Date: Wed, 4 Dec 2024 07:46:58 +0000 Message-ID: <20241204074710.990092-31-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org The constant for the initial resend timeout is in milliseconds, but the variable it's assigned to is in microseconds. Fix the constant to be in microseconds. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/rtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index 3f1ec8e420a6..aff75e168de8 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -12,7 +12,7 @@ #include "ar-internal.h" #define RXRPC_RTO_MAX (120 * USEC_PER_SEC) -#define RXRPC_TIMEOUT_INIT ((unsigned int)(1 * MSEC_PER_SEC)) /* RFC6298 2.1 initial RTO value */ +#define RXRPC_TIMEOUT_INIT ((unsigned int)(1 * USEC_PER_SEC)) /* RFC6298 2.1 initial RTO value */ #define rxrpc_jiffies32 ((u32)jiffies) /* As rxrpc_jiffies32 */ static u32 rxrpc_rto_min_us(struct rxrpc_peer *peer) From patchwork Wed Dec 4 07:46:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893350 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01B3E1F666A for ; Wed, 4 Dec 2024 07:49:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298573; cv=none; b=FZF4wqZDpoRLiqdHtCFJ9b5ACvdXNKl3DKGV9ganj8H6ATVbnl3mjayNMQsyGEBT9LocrPQO4cmb9aRKhv0mFKhcn5HeuZjBSnAAtC6yoJdnn0RHOKNIkklTS7UwIN1uEXhCmUt/vFfBAukISveBkdzPrRpM/wyJJtb3LHDp4+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298573; c=relaxed/simple; bh=LvMICNJekCwdPnGFI2BLmiZad7122HVzp6hxGDUl9ew=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kz+1Q9W26Ph8kD6HAihcnL4nYcVh/m5OKW9zx7fkTgxwHRV1yJZVV8htrPSkJ7LL8ysrT9Lnodoh2KRwWLoMpq0+tJiLRcEvEgWNpw1IUwC7uPqLToC16CuPp++pzdELk7sGVHLOCpp10vPEgfcd2jnVd4cwhpF4SVnto5njCQw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bUIeedHw; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bUIeedHw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yrbBSXk5EpbO0BUkqOm3977XzZKy0gLnCr7MnMkpeY4=; b=bUIeedHwQMgTTjTOf+/6UZd1oDxw4eByEfac8Imrm91192w0QhVimfiSwtihI9BBUyq0UR +j7V87IuEfc0pYc76Kw1HikX9AS5Irw8wl2XZgpDAj6M6qxTlIKv/eQU8+5GBRPJrbPBqb D/0a/wDiv17aXCqNXRh3EmDnNml3Cx0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-684-XU8g3rKnNEqhpQ2vx08cYA-1; Wed, 04 Dec 2024 02:49:28 -0500 X-MC-Unique: XU8g3rKnNEqhpQ2vx08cYA-1 X-Mimecast-MFC-AGG-ID: XU8g3rKnNEqhpQ2vx08cYA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7572D1956046; Wed, 4 Dec 2024 07:49:26 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C41971956089; Wed, 4 Dec 2024 07:49:23 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 31/39] rxrpc: Send jumbo DATA packets Date: Wed, 4 Dec 2024 07:46:59 +0000 Message-ID: <20241204074710.990092-32-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Send jumbo DATA packets if the path-MTU probing using padded PING ACK packets shows up sufficient capacity to do so. This allows larger chunks of data to be sent without reducing the retryability as the subpackets in a jumbo packet can also be retransmitted individually. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 1 + net/rxrpc/call_event.c | 2 +- net/rxrpc/call_object.c | 1 + net/rxrpc/input.c | 3 +++ 4 files changed, 6 insertions(+), 1 deletion(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d0d0ab453909..1307749a1a74 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -710,6 +710,7 @@ struct rxrpc_call { u16 tx_backoff; /* Delay to insert due to Tx failure (ms) */ u8 tx_winsize; /* Maximum size of Tx window */ #define RXRPC_TX_MAX_WINDOW 128 + u8 tx_jumbo_max; /* Maximum subpkts peer will accept */ ktime_t tx_last_sent; /* Last time a transmission occurred */ /* Received data tracking */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 4390c97e3ba6..39772459426b 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -288,7 +288,7 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call) struct rxrpc_txqueue *tq; struct rxrpc_txbuf *txb; rxrpc_seq_t send_top, seq; - int limit = min(space, 1); + int limit = min(space, max(call->peer->pmtud_jumbo, 1)); /* Order send_top before the contents of the new txbufs and * txqueue pointers diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index bba058055c97..e0644e9a8d21 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -155,6 +155,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, refcount_set(&call->ref, 1); call->debug_id = debug_id; call->tx_total_len = -1; + call->tx_jumbo_max = 1; call->next_rx_timo = 20 * HZ; call->next_req_timo = 1 * HZ; call->ackr_window = 1; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 1eb9c22aba51..a7a249872a54 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -796,8 +796,11 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb peer->ackr_adv_pmtud = true; } else { peer->ackr_adv_pmtud = false; + capacity = clamp(capacity, 1, jumbo_max); } + call->tx_jumbo_max = capacity; + if (wake) wake_up(&call->waitq); } From patchwork Wed Dec 4 07:47:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893351 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D9901E0496 for ; Wed, 4 Dec 2024 07:49:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298578; cv=none; b=GgHBT8LsrDFmgNrVHXXreuhcl/x0TykO2MkTKx/AfnA11BELNm7gbXwIIJnJdIk5EeBkBfIvdL02/2xGZR0aqTLgSPahKCdHPxAY0qKQp8f997ZqtRuE1Yq79Tj4gkmxGOy0Yk2lMmHuY78yiAhhsA09hVTTgdxPGF6JbSFYwZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298578; c=relaxed/simple; bh=r6XxiMorSB7iCxcRmjAn6I0MmLgfybiYXyywp/2AbvA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JGXkmfe8J+1PB8h7Y1lKZGCy29eNgPtSqrhLMtvYfCBKl6faKnFvKtSzSvRs+Qnm/CT6JYVCMB1E9HypVTdxmr+Qeh2Yt9mKEKYMKsr9x/pTfSUkZzSWyRt09YdHttjo0gz2mJndcoP2t38XdftkhDv8cpquPFdq4z0+/MOohGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V+YreIzL; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V+YreIzL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298575; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1fNoQdyhV9AAcRe00YDnSdUYOhhnLt52W/9/eqOE1q8=; b=V+YreIzLYYde9XKEYNgRAfRxPoC6eQXK/6+/1LwerfCxEZ/HGkaOnUhbYooYuwarv7SWPm uElOY5xINAMyS3g37H/dbYxFc6i5KD7i60p5IOFbG7iUsC++5AYv+Uu8BxXViYvDiwwSb5 FcrMLT81OPyJKvxGalbqOGZl5+D2SWI= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-57-JPgeBJCgMCeNmsFB1B_1iQ-1; Wed, 04 Dec 2024 02:49:31 -0500 X-MC-Unique: JPgeBJCgMCeNmsFB1B_1iQ-1 X-Mimecast-MFC-AGG-ID: JPgeBJCgMCeNmsFB1B_1iQ Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 55D6E1956055; Wed, 4 Dec 2024 07:49:30 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C76083000197; Wed, 4 Dec 2024 07:49:27 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 32/39] rxrpc: Don't allocate a txbuf for an ACK transmission Date: Wed, 4 Dec 2024 07:47:00 +0000 Message-ID: <20241204074710.990092-33-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Don't allocate an rxrpc_txbuf struct for an ACK transmission. There's now no need as the memory to hold the ACK content is allocated with a page frag allocator. The allocation and freeing of a txbuf is just unnecessary overhead. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 2 - net/rxrpc/ar-internal.h | 5 +- net/rxrpc/output.c | 210 ++++++++++++++++++++++------------- net/rxrpc/txbuf.c | 76 ------------- 4 files changed, 131 insertions(+), 162 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 7681c67f7d65..326a4c257aea 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -462,13 +462,11 @@ /* ---- Must update size of stat_why_req_ack[] if more are added! */ #define rxrpc_txbuf_traces \ - EM(rxrpc_txbuf_alloc_ack, "ALLOC ACK ") \ EM(rxrpc_txbuf_alloc_data, "ALLOC DATA ") \ EM(rxrpc_txbuf_free, "FREE ") \ EM(rxrpc_txbuf_get_buffer, "GET BUFFER ") \ EM(rxrpc_txbuf_get_trans, "GET TRANS ") \ EM(rxrpc_txbuf_get_retrans, "GET RETRANS") \ - EM(rxrpc_txbuf_put_ack_tx, "PUT ACK TX ") \ EM(rxrpc_txbuf_put_cleaned, "PUT CLEANED") \ EM(rxrpc_txbuf_put_nomem, "PUT NOMEM ") \ EM(rxrpc_txbuf_put_rotated, "PUT ROTATED") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 1307749a1a74..db93d7f78902 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -834,11 +834,9 @@ struct rxrpc_txbuf { #define RXRPC_TXBUF_WIRE_FLAGS 0xff /* The wire protocol flags */ #define RXRPC_TXBUF_RESENT 0x100 /* Set if has been resent */ __be16 cksum; /* Checksum to go in header */ - unsigned short ack_rwind; /* ACK receive window */ - u8 /*enum rxrpc_propose_ack_trace*/ ack_why; /* If ack, why */ bool jumboable; /* Can be non-terminal jumbo subpacket */ u8 nr_kvec; /* Amount of kvec[] used */ - struct kvec kvec[3]; + struct kvec kvec[1]; }; static inline bool rxrpc_sending_to_server(const struct rxrpc_txbuf *txb) @@ -1364,7 +1362,6 @@ static inline void rxrpc_sysctl_exit(void) {} extern atomic_t rxrpc_nr_txbuf; struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_size, size_t data_align, gfp_t gfp); -struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_size); void rxrpc_get_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); void rxrpc_see_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); void rxrpc_put_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 20bf45317264..a7de8a02f419 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -71,22 +71,97 @@ static void rxrpc_set_keepalive(struct rxrpc_call *call, ktime_t now) trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_keepalive); } +/* + * Allocate transmission buffers for an ACK and attach them to local->kv[]. + */ +static int rxrpc_alloc_ack(struct rxrpc_call *call, size_t sack_size) +{ + struct rxrpc_wire_header *whdr; + struct rxrpc_acktrailer *trailer; + struct rxrpc_ackpacket *ack; + struct kvec *kv = call->local->kvec; + gfp_t gfp = rcu_read_lock_held() ? GFP_ATOMIC | __GFP_NOWARN : GFP_NOFS; + void *buf, *buf2 = NULL; + u8 *filler; + + buf = page_frag_alloc(&call->local->tx_alloc, + sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); + if (!buf) + return -ENOMEM; + + if (sack_size) { + buf2 = page_frag_alloc(&call->local->tx_alloc, sack_size, gfp); + if (!buf2) { + page_frag_free(buf); + return -ENOMEM; + } + } + + whdr = buf; + ack = buf + sizeof(*whdr); + filler = buf + sizeof(*whdr) + sizeof(*ack) + 1; + trailer = buf + sizeof(*whdr) + sizeof(*ack) + 1 + 3; + + kv[0].iov_base = whdr; + kv[0].iov_len = sizeof(*whdr) + sizeof(*ack); + kv[1].iov_base = buf2; + kv[1].iov_len = sack_size; + kv[2].iov_base = filler; + kv[2].iov_len = 3 + sizeof(*trailer); + return 3; /* Number of kvec[] used. */ +} + +static void rxrpc_free_ack(struct rxrpc_call *call) +{ + page_frag_free(call->local->kvec[0].iov_base); + if (call->local->kvec[1].iov_base) + page_frag_free(call->local->kvec[1].iov_base); +} + +/* + * Record the beginning of an RTT probe. + */ +static void rxrpc_begin_rtt_probe(struct rxrpc_call *call, rxrpc_serial_t serial, + ktime_t now, enum rxrpc_rtt_tx_trace why) +{ + unsigned long avail = call->rtt_avail; + int rtt_slot = 9; + + if (!(avail & RXRPC_CALL_RTT_AVAIL_MASK)) + goto no_slot; + + rtt_slot = __ffs(avail & RXRPC_CALL_RTT_AVAIL_MASK); + if (!test_and_clear_bit(rtt_slot, &call->rtt_avail)) + goto no_slot; + + call->rtt_serial[rtt_slot] = serial; + call->rtt_sent_at[rtt_slot] = now; + smp_wmb(); /* Write data before avail bit */ + set_bit(rtt_slot + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); + + trace_rxrpc_rtt_tx(call, why, rtt_slot, serial); + return; + +no_slot: + trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_no_slot, rtt_slot, serial); +} + /* * Fill out an ACK packet. */ -static void rxrpc_fill_out_ack(struct rxrpc_call *call, - struct rxrpc_txbuf *txb, - u8 ack_reason, - rxrpc_serial_t serial) +static int rxrpc_fill_out_ack(struct rxrpc_call *call, int nr_kv, u8 ack_reason, + rxrpc_serial_t serial_to_ack, rxrpc_serial_t *_ack_serial) { - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; - struct rxrpc_acktrailer *trailer = txb->kvec[2].iov_base + 3; + struct kvec *kv = call->local->kvec; + struct rxrpc_wire_header *whdr = kv[0].iov_base; + struct rxrpc_acktrailer *trailer = kv[2].iov_base + 3; struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); unsigned int qsize, sack, wrap, to, max_mtu, if_mtu; rxrpc_seq_t window, wtop; + ktime_t now = ktime_get_real(); int rsize; - u8 *filler = txb->kvec[2].iov_base; - u8 *sackp = txb->kvec[1].iov_base; + u8 *filler = kv[2].iov_base; + u8 *sackp = kv[1].iov_base; rxrpc_inc_stat(call->rxnet, stat_tx_ack_fill); @@ -94,14 +169,25 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, wtop = call->ackr_wtop; sack = call->ackr_sack_base % RXRPC_SACK_SIZE; + *_ack_serial = rxrpc_get_next_serial(call->conn); + + whdr->epoch = htonl(call->conn->proto.epoch); + whdr->cid = htonl(call->cid); + whdr->callNumber = htonl(call->call_id); + whdr->serial = htonl(*_ack_serial); whdr->seq = 0; whdr->type = RXRPC_PACKET_TYPE_ACK; - txb->flags |= RXRPC_SLOW_START_OK; + whdr->flags = call->conn->out_clientflag | RXRPC_SLOW_START_OK; + whdr->userStatus = 0; + whdr->securityIndex = call->security_ix; + whdr->_rsvd = 0; + whdr->serviceId = htons(call->dest_srx.srx_service); + ack->bufferSpace = 0; ack->maxSkew = 0; ack->firstPacket = htonl(window); ack->previousPacket = htonl(call->rx_highest_seq); - ack->serial = htonl(serial); + ack->serial = htonl(serial_to_ack); ack->reason = ack_reason; ack->nAcks = wtop - window; filler[0] = 0; @@ -109,12 +195,10 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, filler[2] = 0; if (ack_reason == RXRPC_ACK_PING) - txb->flags |= RXRPC_REQUEST_ACK; + whdr->flags |= RXRPC_REQUEST_ACK; if (after(wtop, window)) { - txb->len += ack->nAcks; - txb->kvec[1].iov_base = sackp; - txb->kvec[1].iov_len = ack->nAcks; + kv[1].iov_len = ack->nAcks; wrap = RXRPC_SACK_SIZE - sack; to = umin(ack->nAcks, RXRPC_SACK_SIZE); @@ -133,7 +217,6 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, qsize = (window - 1) - call->rx_consumed; rsize = max_t(int, call->rx_winsize - qsize, 0); - txb->ack_rwind = rsize; if_mtu = call->peer->if_mtu - call->peer->hdrsize; if (call->peer->ackr_adv_pmtud) { @@ -147,48 +230,27 @@ static void rxrpc_fill_out_ack(struct rxrpc_call *call, trailer->ifMTU = htonl(if_mtu); trailer->rwind = htonl(rsize); trailer->jumbo_max = 0; /* Advertise pmtu discovery */ -} - -/* - * Record the beginning of an RTT probe. - */ -static void rxrpc_begin_rtt_probe(struct rxrpc_call *call, rxrpc_serial_t serial, - ktime_t now, enum rxrpc_rtt_tx_trace why) -{ - unsigned long avail = call->rtt_avail; - int rtt_slot = 9; - if (!(avail & RXRPC_CALL_RTT_AVAIL_MASK)) - goto no_slot; - - rtt_slot = __ffs(avail & RXRPC_CALL_RTT_AVAIL_MASK); - if (!test_and_clear_bit(rtt_slot, &call->rtt_avail)) - goto no_slot; - - call->rtt_serial[rtt_slot] = serial; - call->rtt_sent_at[rtt_slot] = now; - smp_wmb(); /* Write data before avail bit */ - set_bit(rtt_slot + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); - - trace_rxrpc_rtt_tx(call, why, rtt_slot, serial); - return; - -no_slot: - trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_no_slot, rtt_slot, serial); + if (ack_reason == RXRPC_ACK_PING) + rxrpc_begin_rtt_probe(call, *_ack_serial, now, rxrpc_rtt_tx_ping); + if (whdr->flags & RXRPC_REQUEST_ACK) + call->peer->rtt_last_req = now; + rxrpc_set_keepalive(call, now); + return nr_kv; } /* * Transmit an ACK packet. */ -static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb, - int nr_kv, enum rxrpc_propose_ack_trace why) +static void rxrpc_send_ack_packet(struct rxrpc_call *call, int nr_kv, size_t len, + rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why) { struct kvec *kv = call->local->kvec; struct rxrpc_wire_header *whdr = kv[0].iov_base; + struct rxrpc_acktrailer *trailer = kv[2].iov_base + 3; struct rxrpc_connection *conn; struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); struct msghdr msg; - ktime_t now; int ret; if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) @@ -202,41 +264,31 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t msg.msg_controllen = 0; msg.msg_flags = MSG_SPLICE_PAGES; - whdr->flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; - - txb->serial = rxrpc_get_next_serial(conn); - whdr->serial = htonl(txb->serial); - trace_rxrpc_tx_ack(call->debug_id, txb->serial, + trace_rxrpc_tx_ack(call->debug_id, serial, ntohl(ack->firstPacket), ntohl(ack->serial), ack->reason, ack->nAcks, - txb->ack_rwind); + ntohl(trailer->rwind)); rxrpc_inc_stat(call->rxnet, stat_tx_ack_send); - iov_iter_kvec(&msg.msg_iter, WRITE, kv, nr_kv, txb->len); + iov_iter_kvec(&msg.msg_iter, WRITE, kv, nr_kv, len); rxrpc_local_dont_fragment(conn->local, why == rxrpc_propose_ack_ping_for_mtu_probe); - ret = do_udp_sendmsg(conn->local->socket, &msg, txb->len); + ret = do_udp_sendmsg(conn->local->socket, &msg, len); call->peer->last_tx_at = ktime_get_seconds(); if (ret < 0) { - trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, + trace_rxrpc_tx_fail(call->debug_id, serial, ret, rxrpc_tx_point_call_ack); if (why == rxrpc_propose_ack_ping_for_mtu_probe && ret == -EMSGSIZE) - rxrpc_input_probe_for_pmtud(conn, txb->serial, true); + rxrpc_input_probe_for_pmtud(conn, serial, true); } else { trace_rxrpc_tx_packet(call->debug_id, whdr, rxrpc_tx_point_call_ack); - now = ktime_get_real(); - if (ack->reason == RXRPC_ACK_PING) - rxrpc_begin_rtt_probe(call, txb->serial, now, rxrpc_rtt_tx_ping); - if (txb->flags & RXRPC_REQUEST_ACK) - call->peer->rtt_last_req = now; - rxrpc_set_keepalive(call, now); if (why == rxrpc_propose_ack_ping_for_mtu_probe) { call->peer->pmtud_pending = false; call->peer->pmtud_probing = true; - call->conn->pmtud_probe = txb->serial; + call->conn->pmtud_probe = serial; call->conn->pmtud_call = call->debug_id; trace_rxrpc_pmtud_tx(call); } @@ -248,10 +300,11 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *t * Queue an ACK for immediate transmission. */ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, - rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why) + rxrpc_serial_t serial_to_ack, enum rxrpc_propose_ack_trace why) { - struct rxrpc_txbuf *txb; struct kvec *kv = call->local->kvec; + rxrpc_serial_t ack_serial; + size_t len; int nr_kv; if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) @@ -259,32 +312,29 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, rxrpc_inc_stat(call->rxnet, stat_tx_acks[ack_reason]); - txb = rxrpc_alloc_ack_txbuf(call, call->ackr_wtop - call->ackr_window); - if (!txb) { + nr_kv = rxrpc_alloc_ack(call, call->ackr_wtop - call->ackr_window); + if (nr_kv < 0) { kleave(" = -ENOMEM"); return; } - txb->ack_why = why; - - rxrpc_fill_out_ack(call, txb, ack_reason, serial); + nr_kv = rxrpc_fill_out_ack(call, nr_kv, ack_reason, serial_to_ack, &ack_serial); + len = kv[0].iov_len; + len += kv[1].iov_len; + len += kv[2].iov_len; /* Extend a path MTU probe ACK. */ - nr_kv = txb->nr_kvec; - kv[0] = txb->kvec[0]; - kv[1] = txb->kvec[1]; - kv[2] = txb->kvec[2]; if (why == rxrpc_propose_ack_ping_for_mtu_probe) { size_t probe_mtu = call->peer->pmtud_trial + sizeof(struct rxrpc_wire_header); - if (txb->len > probe_mtu) + if (len > probe_mtu) goto skip; - while (txb->len < probe_mtu) { - size_t part = umin(probe_mtu - txb->len, PAGE_SIZE); + while (len < probe_mtu) { + size_t part = umin(probe_mtu - len, PAGE_SIZE); kv[nr_kv].iov_base = page_address(ZERO_PAGE(0)); kv[nr_kv].iov_len = part; - txb->len += part; + len += part; nr_kv++; } } @@ -293,10 +343,10 @@ void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, atomic_set(&call->ackr_nr_consumed, 0); clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags); - trace_rxrpc_send_ack(call, why, ack_reason, serial); - rxrpc_send_ack_packet(call, txb, nr_kv, why); + trace_rxrpc_send_ack(call, why, ack_reason, ack_serial); + rxrpc_send_ack_packet(call, nr_kv, len, ack_serial, why); skip: - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_ack_tx); + rxrpc_free_ack(call); } /* diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index 067223c8c35f..131d9e55c8e9 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -73,82 +73,6 @@ struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_ return txb; } -/* - * Allocate and partially initialise an ACK packet. - */ -struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_size) -{ - struct rxrpc_wire_header *whdr; - struct rxrpc_acktrailer *trailer; - struct rxrpc_ackpacket *ack; - struct rxrpc_txbuf *txb; - gfp_t gfp = rcu_read_lock_held() ? GFP_ATOMIC | __GFP_NOWARN : GFP_NOFS; - void *buf, *buf2 = NULL; - u8 *filler; - - txb = kmalloc(sizeof(*txb), gfp); - if (!txb) - return NULL; - - buf = page_frag_alloc(&call->local->tx_alloc, - sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); - if (!buf) { - kfree(txb); - return NULL; - } - - if (sack_size) { - buf2 = page_frag_alloc(&call->local->tx_alloc, sack_size, gfp); - if (!buf2) { - page_frag_free(buf); - kfree(txb); - return NULL; - } - } - - whdr = buf; - ack = buf + sizeof(*whdr); - filler = buf + sizeof(*whdr) + sizeof(*ack) + 1; - trailer = buf + sizeof(*whdr) + sizeof(*ack) + 1 + 3; - - refcount_set(&txb->ref, 1); - txb->call_debug_id = call->debug_id; - txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); - txb->space = 0; - txb->len = sizeof(*whdr) + sizeof(*ack) + 3 + sizeof(*trailer); - txb->offset = 0; - txb->flags = call->conn->out_clientflag; - txb->ack_rwind = 0; - txb->seq = 0; - txb->serial = 0; - txb->cksum = 0; - txb->nr_kvec = 3; - txb->kvec[0].iov_base = whdr; - txb->kvec[0].iov_len = sizeof(*whdr) + sizeof(*ack); - txb->kvec[1].iov_base = buf2; - txb->kvec[1].iov_len = sack_size; - txb->kvec[2].iov_base = filler; - txb->kvec[2].iov_len = 3 + sizeof(*trailer); - - whdr->epoch = htonl(call->conn->proto.epoch); - whdr->cid = htonl(call->cid); - whdr->callNumber = htonl(call->call_id); - whdr->seq = 0; - whdr->type = RXRPC_PACKET_TYPE_ACK; - whdr->flags = 0; - whdr->userStatus = 0; - whdr->securityIndex = call->security_ix; - whdr->_rsvd = 0; - whdr->serviceId = htons(call->dest_srx.srx_service); - - get_page(virt_to_head_page(trailer)); - - trace_rxrpc_txbuf(txb->debug_id, txb->call_debug_id, txb->seq, 1, - rxrpc_txbuf_alloc_ack); - atomic_inc(&rxrpc_nr_txbuf); - return txb; -} - void rxrpc_get_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what) { int r; From patchwork Wed Dec 4 07:47:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893352 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 830871FCCE0 for ; Wed, 4 Dec 2024 07:49:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298585; cv=none; b=AXvcOqKH0KTibxjl/dwYZKGdcHAj6pkscCE9ZRJSMalvvcg4E9I6OnvtCY0sYBBIH3GQg7BYa/cTiIv4vdj/Il+WoxZVOu4hQ0xfrR/Ord8+tGdEScZ3HoCGTerHk5LeZqRwc9TgT8XdxHHjzTQ7QFioUhxhU1oczAgq7woor/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298585; c=relaxed/simple; bh=fO1um058DF0aFv/fz1cON4AqOGpJq5NjImfZVb/rIic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Aa4o2i5TDTe7bXDwkJdLKO0JNz0+P7LOg46Mi2239ZkPun/aK+AdPYRSUNo2Ux9+PWb3yNd2cobnLoV9Yzdbh6ZUSG69WPmba9v2fuluvRqcG9+Ea85ZdHGutYtslsTHUBobYvaiFAKXdio14MZexcbbVW4StMrT0hg1BX/ktEc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hw+XKO7y; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hw+XKO7y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nrTLfvJTNUQeHO+YPWFAYFS+oZ1V2n10ClFkQ4IdlbI=; b=hw+XKO7ykjLRw2lNU2UT3/HG9XvXffHl8eH1cwYhUT9P0/22H6Tro1gKShNWyMJBasz0JK toYQvOiBbquJsjf6vczxCBr0M8DUGZ7FF4C40ca/kMX782S+jRrEHWn4e2Sa6sU12TrioD P7lfCa6VhhAXUuZS+XOW6ua67gYowqM= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-169-hA2ZGIbfOxOYN2oLKBfwBA-1; Wed, 04 Dec 2024 02:49:36 -0500 X-MC-Unique: hA2ZGIbfOxOYN2oLKBfwBA-1 X-Mimecast-MFC-AGG-ID: hA2ZGIbfOxOYN2oLKBfwBA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5620F1956072; Wed, 4 Dec 2024 07:49:34 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C2B9F3000197; Wed, 4 Dec 2024 07:49:31 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 33/39] rxrpc: Use irq-disabling spinlocks between app and I/O thread Date: Wed, 4 Dec 2024 07:47:01 +0000 Message-ID: <20241204074710.990092-34-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Patchwork-Delegate: kuba@kernel.org Where a spinlock is used by both the application thread and the I/O thread, use irq-disabling locking so that an interrupt taken on the app thread doesn't also slow down the I/O thread. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/af_rxrpc.c | 4 ++-- net/rxrpc/ar-internal.h | 1 - net/rxrpc/call_accept.c | 20 ++++++++++---------- net/rxrpc/call_object.c | 15 +++++++-------- net/rxrpc/conn_client.c | 12 ++++++------ net/rxrpc/conn_event.c | 8 ++++---- net/rxrpc/conn_object.c | 8 ++++---- net/rxrpc/input.c | 5 +---- net/rxrpc/io_thread.c | 8 ++++---- net/rxrpc/peer_event.c | 8 ++++---- net/rxrpc/peer_object.c | 1 + net/rxrpc/recvmsg.c | 18 +++++++++--------- net/rxrpc/security.c | 4 ++-- net/rxrpc/sendmsg.c | 2 -- 14 files changed, 54 insertions(+), 60 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 9d8bd0b37e41..86873399f7d5 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -408,9 +408,9 @@ void rxrpc_kernel_shutdown_call(struct socket *sock, struct rxrpc_call *call) /* Make sure we're not going to call back into a kernel service */ if (call->notify_rx) { - spin_lock(&call->notify_lock); + spin_lock_irq(&call->notify_lock); call->notify_rx = rxrpc_dummy_notify_rx; - spin_unlock(&call->notify_lock); + spin_unlock_irq(&call->notify_lock); } } mutex_unlock(&call->user_mutex); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index db93d7f78902..ffd80dc88f40 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -700,7 +700,6 @@ struct rxrpc_call { struct rxrpc_txqueue *send_queue; /* Queue that sendmsg is writing into */ /* Transmitted data tracking. */ - spinlock_t tx_lock; /* Transmit queue lock */ struct rxrpc_txqueue *tx_queue; /* Start of transmission buffers */ struct rxrpc_txqueue *tx_qtail; /* End of transmission buffers */ rxrpc_seq_t tx_qbase; /* First slot in tx_queue */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index a6776b1604ba..e685034ce4f7 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -188,8 +188,8 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx) /* Make sure that there aren't any incoming calls in progress before we * clear the preallocation buffers. */ - spin_lock(&rx->incoming_lock); - spin_unlock(&rx->incoming_lock); + spin_lock_irq(&rx->incoming_lock); + spin_unlock_irq(&rx->incoming_lock); head = b->peer_backlog_head; tail = b->peer_backlog_tail; @@ -343,7 +343,7 @@ bool rxrpc_new_incoming_call(struct rxrpc_local *local, if (sp->hdr.type != RXRPC_PACKET_TYPE_DATA) return rxrpc_protocol_error(skb, rxrpc_eproto_no_service_call); - read_lock(&local->services_lock); + read_lock_irq(&local->services_lock); /* Weed out packets to services we're not offering. Packets that would * begin a call are explicitly rejected and the rest are just @@ -399,12 +399,12 @@ bool rxrpc_new_incoming_call(struct rxrpc_local *local, spin_unlock(&conn->state_lock); spin_unlock(&rx->incoming_lock); - read_unlock(&local->services_lock); + read_unlock_irq(&local->services_lock); if (hlist_unhashed(&call->error_link)) { - spin_lock(&call->peer->lock); + spin_lock_irq(&call->peer->lock); hlist_add_head(&call->error_link, &call->peer->error_targets); - spin_unlock(&call->peer->lock); + spin_unlock_irq(&call->peer->lock); } _leave(" = %p{%d}", call, call->debug_id); @@ -413,20 +413,20 @@ bool rxrpc_new_incoming_call(struct rxrpc_local *local, return true; unsupported_service: - read_unlock(&local->services_lock); + read_unlock_irq(&local->services_lock); return rxrpc_direct_abort(skb, rxrpc_abort_service_not_offered, RX_INVALID_OPERATION, -EOPNOTSUPP); unsupported_security: - read_unlock(&local->services_lock); + read_unlock_irq(&local->services_lock); return rxrpc_direct_abort(skb, rxrpc_abort_service_not_offered, RX_INVALID_OPERATION, -EKEYREJECTED); no_call: spin_unlock(&rx->incoming_lock); - read_unlock(&local->services_lock); + read_unlock_irq(&local->services_lock); _leave(" = f [%u]", skb->mark); return false; discard: - read_unlock(&local->services_lock); + read_unlock_irq(&local->services_lock); return true; } diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index e0644e9a8d21..75cd0b06e14c 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -49,7 +49,7 @@ void rxrpc_poke_call(struct rxrpc_call *call, enum rxrpc_call_poke_trace what) bool busy; if (!test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) { - spin_lock_bh(&local->lock); + spin_lock_irq(&local->lock); busy = !list_empty(&call->attend_link); trace_rxrpc_poke_call(call, busy, what); if (!busy && !rxrpc_try_get_call(call, rxrpc_call_get_poke)) @@ -57,7 +57,7 @@ void rxrpc_poke_call(struct rxrpc_call *call, enum rxrpc_call_poke_trace what) if (!busy) { list_add_tail(&call->attend_link, &local->call_attend_q); } - spin_unlock_bh(&local->lock); + spin_unlock_irq(&local->lock); if (!busy) rxrpc_wake_up_io_thread(local); } @@ -151,7 +151,6 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, skb_queue_head_init(&call->rx_oos_queue); init_waitqueue_head(&call->waitq); spin_lock_init(&call->notify_lock); - spin_lock_init(&call->tx_lock); refcount_set(&call->ref, 1); call->debug_id = debug_id; call->tx_total_len = -1; @@ -302,9 +301,9 @@ static int rxrpc_connect_call(struct rxrpc_call *call, gfp_t gfp) trace_rxrpc_client(NULL, -1, rxrpc_client_queue_new_call); rxrpc_get_call(call, rxrpc_call_get_io_thread); - spin_lock(&local->client_call_lock); + spin_lock_irq(&local->client_call_lock); list_add_tail(&call->wait_link, &local->new_client_calls); - spin_unlock(&local->client_call_lock); + spin_unlock_irq(&local->client_call_lock); rxrpc_wake_up_io_thread(local); return 0; @@ -434,7 +433,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, /* * Set up an incoming call. call->conn points to the connection. - * This is called in BH context and isn't allowed to fail. + * This is called with interrupts disabled and isn't allowed to fail. */ void rxrpc_incoming_call(struct rxrpc_sock *rx, struct rxrpc_call *call, @@ -576,7 +575,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call) rxrpc_put_call_slot(call); /* Make sure we don't get any more notifications */ - spin_lock(&rx->recvmsg_lock); + spin_lock_irq(&rx->recvmsg_lock); if (!list_empty(&call->recvmsg_link)) { _debug("unlinking once-pending call %p { e=%lx f=%lx }", @@ -589,7 +588,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call) call->recvmsg_link.next = NULL; call->recvmsg_link.prev = NULL; - spin_unlock(&rx->recvmsg_lock); + spin_unlock_irq(&rx->recvmsg_lock); if (put) rxrpc_put_call(call, rxrpc_call_put_unnotify); diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 5f76bd90567c..db0099197890 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -510,9 +510,9 @@ void rxrpc_connect_client_calls(struct rxrpc_local *local) struct rxrpc_call *call; LIST_HEAD(new_client_calls); - spin_lock(&local->client_call_lock); + spin_lock_irq(&local->client_call_lock); list_splice_tail_init(&local->new_client_calls, &new_client_calls); - spin_unlock(&local->client_call_lock); + spin_unlock_irq(&local->client_call_lock); while ((call = list_first_entry_or_null(&new_client_calls, struct rxrpc_call, wait_link))) { @@ -547,9 +547,9 @@ void rxrpc_expose_client_call(struct rxrpc_call *call) set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags); trace_rxrpc_client(conn, channel, rxrpc_client_exposed); - spin_lock(&call->peer->lock); + spin_lock_irq(&call->peer->lock); hlist_add_head(&call->error_link, &call->peer->error_targets); - spin_unlock(&call->peer->lock); + spin_unlock_irq(&call->peer->lock); } } @@ -590,9 +590,9 @@ void rxrpc_disconnect_client_call(struct rxrpc_bundle *bundle, struct rxrpc_call ASSERTCMP(call->call_id, ==, 0); ASSERT(!test_bit(RXRPC_CALL_EXPOSED, &call->flags)); /* May still be on ->new_client_calls. */ - spin_lock(&local->client_call_lock); + spin_lock_irq(&local->client_call_lock); list_del_init(&call->wait_link); - spin_unlock(&local->client_call_lock); + spin_unlock_irq(&local->client_call_lock); return; } diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index f6c02cc44d98..6b29a294ee07 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -26,7 +26,7 @@ static bool rxrpc_set_conn_aborted(struct rxrpc_connection *conn, struct sk_buff bool aborted = false; if (conn->state != RXRPC_CONN_ABORTED) { - spin_lock(&conn->state_lock); + spin_lock_irq(&conn->state_lock); if (conn->state != RXRPC_CONN_ABORTED) { conn->abort_code = abort_code; conn->error = err; @@ -37,7 +37,7 @@ static bool rxrpc_set_conn_aborted(struct rxrpc_connection *conn, struct sk_buff set_bit(RXRPC_CONN_EV_ABORT_CALLS, &conn->events); aborted = true; } - spin_unlock(&conn->state_lock); + spin_unlock_irq(&conn->state_lock); } return aborted; @@ -261,10 +261,10 @@ static int rxrpc_process_event(struct rxrpc_connection *conn, if (ret < 0) return ret; - spin_lock(&conn->state_lock); + spin_lock_irq(&conn->state_lock); if (conn->state == RXRPC_CONN_SERVICE_CHALLENGING) conn->state = RXRPC_CONN_SERVICE; - spin_unlock(&conn->state_lock); + spin_unlock_irq(&conn->state_lock); if (conn->state == RXRPC_CONN_SERVICE) { /* Offload call state flipping to the I/O thread. As diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index b0627398311b..7eba4d7d9a38 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -31,13 +31,13 @@ void rxrpc_poke_conn(struct rxrpc_connection *conn, enum rxrpc_conn_trace why) if (WARN_ON_ONCE(!local)) return; - spin_lock_bh(&local->lock); + spin_lock_irq(&local->lock); busy = !list_empty(&conn->attend_link); if (!busy) { rxrpc_get_connection(conn, why); list_add_tail(&conn->attend_link, &local->conn_attend_q); } - spin_unlock_bh(&local->lock); + spin_unlock_irq(&local->lock); rxrpc_wake_up_io_thread(local); } @@ -196,9 +196,9 @@ void rxrpc_disconnect_call(struct rxrpc_call *call) call->peer->cong_ssthresh = call->cong_ssthresh; if (!hlist_unhashed(&call->error_link)) { - spin_lock(&call->peer->lock); + spin_lock_irq(&call->peer->lock); hlist_del_init(&call->error_link); - spin_unlock(&call->peer->lock); + spin_unlock_irq(&call->peer->lock); } if (rxrpc_is_client_call(call)) { diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index a7a249872a54..821e10c03086 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -424,7 +424,7 @@ static void rxrpc_input_queue_data(struct rxrpc_call *call, struct sk_buff *skb, struct rxrpc_skb_priv *sp = rxrpc_skb(skb); bool last = sp->hdr.flags & RXRPC_LAST_PACKET; - __skb_queue_tail(&call->recvmsg_queue, skb); + skb_queue_tail(&call->recvmsg_queue, skb); rxrpc_input_update_ack_window(call, window, wtop); trace_rxrpc_receive(call, last ? why + 1 : why, sp->hdr.serial, sp->hdr.seq); if (last) @@ -501,7 +501,6 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, rxrpc_get_skb(skb, rxrpc_skb_get_to_recvmsg); - spin_lock(&call->recvmsg_queue.lock); rxrpc_input_queue_data(call, skb, window, wtop, rxrpc_receive_queue); *_notify = true; @@ -523,8 +522,6 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, rxrpc_receive_queue_oos); } - spin_unlock(&call->recvmsg_queue.lock); - call->ackr_sack_base = sack; } else { unsigned int slot; diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index bc678a299bd8..fbacf2056f64 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -500,9 +500,9 @@ int rxrpc_io_thread(void *data) } /* Deal with connections that want immediate attention. */ - spin_lock_bh(&local->lock); + spin_lock_irq(&local->lock); list_splice_tail_init(&local->conn_attend_q, &conn_attend_q); - spin_unlock_bh(&local->lock); + spin_unlock_irq(&local->lock); while ((conn = list_first_entry_or_null(&conn_attend_q, struct rxrpc_connection, @@ -519,9 +519,9 @@ int rxrpc_io_thread(void *data) rxrpc_discard_expired_client_conns(local); /* Deal with calls that want immediate attention. */ - spin_lock_bh(&local->lock); + spin_lock_irq(&local->lock); list_splice_tail_init(&local->call_attend_q, &call_attend_q); - spin_unlock_bh(&local->lock); + spin_unlock_irq(&local->lock); while ((call = list_first_entry_or_null(&call_attend_q, struct rxrpc_call, diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index ff30e0c05507..d82e44a3901b 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -213,23 +213,23 @@ static void rxrpc_distribute_error(struct rxrpc_peer *peer, struct sk_buff *skb, struct rxrpc_call *call; HLIST_HEAD(error_targets); - spin_lock(&peer->lock); + spin_lock_irq(&peer->lock); hlist_move_list(&peer->error_targets, &error_targets); while (!hlist_empty(&error_targets)) { call = hlist_entry(error_targets.first, struct rxrpc_call, error_link); hlist_del_init(&call->error_link); - spin_unlock(&peer->lock); + spin_unlock_irq(&peer->lock); rxrpc_see_call(call, rxrpc_call_see_distribute_error); rxrpc_set_call_completion(call, compl, 0, -err); rxrpc_input_call_event(call); - spin_lock(&peer->lock); + spin_lock_irq(&peer->lock); } - spin_unlock(&peer->lock); + spin_unlock_irq(&peer->lock); } /* diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 80ef6f06d512..27b34ed4d76a 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -320,6 +320,7 @@ static void rxrpc_free_peer(struct rxrpc_peer *peer) * Set up a new incoming peer. There shouldn't be any other matching peers * since we've already done a search in the list from the non-reentrant context * (the data_ready handler) that is the only place we can add new peers. + * Called with interrupts disabled. */ void rxrpc_new_incoming_peer(struct rxrpc_local *local, struct rxrpc_peer *peer) { diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index a482f88c5fc5..32cd5f1d541d 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -36,16 +36,16 @@ void rxrpc_notify_socket(struct rxrpc_call *call) sk = &rx->sk; if (rx && sk->sk_state < RXRPC_CLOSE) { if (call->notify_rx) { - spin_lock(&call->notify_lock); + spin_lock_irq(&call->notify_lock); call->notify_rx(sk, call, call->user_call_ID); - spin_unlock(&call->notify_lock); + spin_unlock_irq(&call->notify_lock); } else { - spin_lock(&rx->recvmsg_lock); + spin_lock_irq(&rx->recvmsg_lock); if (list_empty(&call->recvmsg_link)) { rxrpc_get_call(call, rxrpc_call_get_notify_socket); list_add_tail(&call->recvmsg_link, &rx->recvmsg_q); } - spin_unlock(&rx->recvmsg_lock); + spin_unlock_irq(&rx->recvmsg_lock); if (!sock_flag(sk, SOCK_DEAD)) { _debug("call %ps", sk->sk_data_ready); @@ -337,14 +337,14 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, * We also want to weed out calls that got requeued whilst we were * shovelling data out. */ - spin_lock(&rx->recvmsg_lock); + spin_lock_irq(&rx->recvmsg_lock); l = rx->recvmsg_q.next; call = list_entry(l, struct rxrpc_call, recvmsg_link); if (!rxrpc_call_is_complete(call) && skb_queue_empty(&call->recvmsg_queue)) { list_del_init(&call->recvmsg_link); - spin_unlock(&rx->recvmsg_lock); + spin_unlock_irq(&rx->recvmsg_lock); release_sock(&rx->sk); trace_rxrpc_recvmsg(call->debug_id, rxrpc_recvmsg_unqueue, 0); rxrpc_put_call(call, rxrpc_call_put_recvmsg); @@ -355,7 +355,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, list_del_init(&call->recvmsg_link); else rxrpc_get_call(call, rxrpc_call_get_recvmsg); - spin_unlock(&rx->recvmsg_lock); + spin_unlock_irq(&rx->recvmsg_lock); call_debug_id = call->debug_id; trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_dequeue, 0); @@ -445,9 +445,9 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, error_requeue_call: if (!(flags & MSG_PEEK)) { - spin_lock(&rx->recvmsg_lock); + spin_lock_irq(&rx->recvmsg_lock); list_add(&call->recvmsg_link, &rx->recvmsg_q); - spin_unlock(&rx->recvmsg_lock); + spin_unlock_irq(&rx->recvmsg_lock); trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_requeue, 0); } else { rxrpc_put_call(call, rxrpc_call_put_recvmsg); diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c index cb8dd1d3b1d4..9784adc8f275 100644 --- a/net/rxrpc/security.c +++ b/net/rxrpc/security.c @@ -114,10 +114,10 @@ int rxrpc_init_client_conn_security(struct rxrpc_connection *conn) if (conn->state == RXRPC_CONN_CLIENT_UNSECURED) { ret = conn->security->init_connection_security(conn, token); if (ret == 0) { - spin_lock(&conn->state_lock); + spin_lock_irq(&conn->state_lock); if (conn->state == RXRPC_CONN_CLIENT_UNSECURED) conn->state = RXRPC_CONN_CLIENT; - spin_unlock(&conn->state_lock); + spin_unlock_irq(&conn->state_lock); } } mutex_unlock(&conn->security_lock); diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 381b25597f4e..df501a7c92fa 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -261,7 +261,6 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, trace_rxrpc_tq(call, sq, seq, rxrpc_tq_queue); /* Add the packet to the call's output buffer */ - spin_lock(&call->tx_lock); poke = (READ_ONCE(call->tx_bottom) == call->send_top); sq->bufs[ix] = txb; /* Order send_top after the queue->next pointer and txb content. */ @@ -270,7 +269,6 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, rxrpc_notify_end_tx(rx, call, notify_end_tx); call->send_queue = NULL; } - spin_unlock(&call->tx_lock); if (poke) rxrpc_poke_call(call, rxrpc_call_poke_start); From patchwork Wed Dec 4 07:47:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893353 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFAB420A5F7 for ; Wed, 4 Dec 2024 07:49:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298588; cv=none; b=egK/52DmSfx49cEywYgkPlJBhPzj2J6BnHC/SosfAVm+/sAOOKxQlbJwlYTijpvVAZrwulKKqsgAeH/sHnys+RNBA8j3mxWxi+FwMl4ub65CocbwQfAcQw6+che7ZGXbhXxEnAO5XEaNWoKF0WlvUxXczMRX9PGQDNWwP5fJr44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298588; c=relaxed/simple; bh=Bg1mncPiIaOmPqV4pkWJbLlWEaugLRDlD/O2yVem6r4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YAr2DvjuMzB39V9eTlDaDk/uLZqkP9qenUUSZ3SMmCpM0vVjgLVIVeZwJi1CEBeCIrZ/e3rR+gfRihrIl1kjDD0UQCldbKFh13A2EiMOdvY6+Dv8/pD1xLeQ2sfEL17E1U3tBm4y6ZTSom5ye5Vb2LGB0JPS8KTl57YOz06JJEI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WtTdIrbX; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WtTdIrbX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=drqVe7cl3VrDK30G34KfMtPZlrAuUJtwo1+CQUOSfUw=; b=WtTdIrbXsE20TiHdAtNKVALSxyQkAf8aYgGUEwkXWeHfLaReaCGJ0r2+RI/qSvDVHSImOb v3qWo33jAopNH/1faM9yNmTsQSTpKKWBDefCq7CjIwgjUrXOUsFBqZr2y9c/Pu+MjruN8Y Lw4PcMVhgy2cq4Lam595enBP3RAzjnY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-215-VIVnprBPP5qotX9vO_u44w-1; Wed, 04 Dec 2024 02:49:39 -0500 X-MC-Unique: VIVnprBPP5qotX9vO_u44w-1 X-Mimecast-MFC-AGG-ID: VIVnprBPP5qotX9vO_u44w Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4C40D19560B5; Wed, 4 Dec 2024 07:49:38 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C34681956054; Wed, 4 Dec 2024 07:49:35 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 34/39] rxrpc: Tidy up the ACK parsing a bit Date: Wed, 4 Dec 2024 07:47:02 +0000 Message-ID: <20241204074710.990092-35-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Patchwork-Delegate: kuba@kernel.org Tidy up the ACK parsing in the following ways: (1) Put the serial number of the ACK packet into the rxrpc_ack_summary struct and access it from there whilst parsing an ACK. (2) Be consistent about using "if (summary.acked_serial)" rather than "if (summary.acked_serial != 0)". Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/ar-internal.h | 1 + net/rxrpc/input.c | 55 +++++++++++++++++++---------------------- 2 files changed, 27 insertions(+), 29 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ffd80dc88f40..aa240b4b4bec 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -771,6 +771,7 @@ struct rxrpc_call { * Summary of a new ACK and the changes it made to the Tx buffer packet states. */ struct rxrpc_ack_summary { + rxrpc_serial_t ack_serial; /* Serial number of ACK */ rxrpc_serial_t acked_serial; /* Serial number ACK'd */ u16 in_flight; /* Number of unreceived transmissions */ u16 nr_new_hacks; /* Number of rotated new ACKs */ diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 821e10c03086..036cf440b63b 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -198,11 +198,10 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, struct rxrpc_txqueue *tq, - int ix, - rxrpc_serial_t ack_serial) + int ix) { rxrpc_peer_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, - summary->acked_serial, ack_serial, + summary->acked_serial, summary->ack_serial, ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]), call->acks_latest_ts); summary->rtt_sample_avail = false; @@ -213,8 +212,7 @@ static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, * Apply a hard ACK by advancing the Tx window. */ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, - struct rxrpc_ack_summary *summary, - rxrpc_serial_t ack_serial) + struct rxrpc_ack_summary *summary) { struct rxrpc_txqueue *tq = call->tx_queue; rxrpc_seq_t seq = call->tx_bottom + 1; @@ -255,7 +253,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, if (summary->rtt_sample_avail && summary->acked_serial == tq->segment_serial[ix] && test_bit(ix, &tq->rtt_samples)) - rxrpc_add_data_rtt_sample(call, summary, tq, ix, ack_serial); + rxrpc_add_data_rtt_sample(call, summary, tq, ix); if (ix == tq->nr_reported_acks) { /* Packet directly hard ACK'd. */ @@ -369,7 +367,7 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call) } if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) { - if (!rxrpc_rotate_tx_window(call, top, &summary, 0)) { + if (!rxrpc_rotate_tx_window(call, top, &summary)) { rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply); return false; } @@ -826,12 +824,11 @@ static void rxrpc_input_ack_trailer(struct rxrpc_call *call, struct sk_buff *skb */ static void rxrpc_input_soft_rtt(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, - struct rxrpc_txqueue *tq, - rxrpc_serial_t ack_serial) + struct rxrpc_txqueue *tq) { for (int ix = 0; ix < RXRPC_NR_TXQUEUE; ix++) if (summary->acked_serial == tq->segment_serial[ix]) - return rxrpc_add_data_rtt_sample(call, summary, tq, ix, ack_serial); + return rxrpc_add_data_rtt_sample(call, summary, tq, ix); } /* @@ -944,7 +941,7 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, _debug("bound %16lx %u", extracted, nr); if (summary->rtt_sample_avail) - rxrpc_input_soft_rtt(call, summary, tq, sp->hdr.serial); + rxrpc_input_soft_rtt(call, summary, tq); rxrpc_input_soft_ack_tq(call, summary, tq, extracted, RXRPC_NR_TXQUEUE, seq - RXRPC_NR_TXQUEUE, &lowest_nak); extracted = ~0UL; @@ -1016,7 +1013,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) struct rxrpc_ack_summary summary = { 0 }; struct rxrpc_acktrailer trailer; struct rxrpc_skb_priv *sp = rxrpc_skb(skb); - rxrpc_serial_t ack_serial; rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; int nr_acks, offset, ioffset; @@ -1024,14 +1020,14 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) offset = sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); - ack_serial = sp->hdr.serial; - first_soft_ack = sp->ack.first_ack; - prev_pkt = sp->ack.prev_ack; - nr_acks = sp->ack.nr_acks; - hard_ack = first_soft_ack - 1; - summary.acked_serial = sp->ack.acked_serial; - summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? - sp->ack.reason : RXRPC_ACK__INVALID); + summary.ack_serial = sp->hdr.serial; + first_soft_ack = sp->ack.first_ack; + prev_pkt = sp->ack.prev_ack; + nr_acks = sp->ack.nr_acks; + hard_ack = first_soft_ack - 1; + summary.acked_serial = sp->ack.acked_serial; + summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? + sp->ack.reason : RXRPC_ACK__INVALID); trace_rxrpc_rx_ack(call, sp); rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); @@ -1066,7 +1062,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) /* Discard any out-of-order or duplicate ACKs (outside lock). */ if (!rxrpc_is_ack_valid(call, hard_ack, prev_pkt)) { - trace_rxrpc_rx_discard_ack(call, ack_serial, hard_ack, prev_pkt); + trace_rxrpc_rx_discard_ack(call, summary.ack_serial, hard_ack, prev_pkt); goto send_response; } @@ -1100,10 +1096,10 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (hard_ack + 1 == 0) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); - if (summary.acked_serial != 0) { + if (summary.acked_serial) { if (summary.ack_reason == RXRPC_ACK_PING_RESPONSE) rxrpc_complete_rtt_probe(call, call->acks_latest_ts, - summary.acked_serial, ack_serial, + summary.acked_serial, summary.ack_serial, rxrpc_rtt_rx_ping_response); else summary.rtt_sample_avail = true; @@ -1127,7 +1123,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_sack_overflow); if (after(hard_ack, call->tx_bottom)) { - if (rxrpc_rotate_tx_window(call, hard_ack, &summary, ack_serial)) { + if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ack); goto send_response; } @@ -1142,19 +1138,20 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) && call->acks_nr_sacks == call->tx_top - hard_ack && rxrpc_is_client_call(call)) - rxrpc_propose_ping(call, ack_serial, + rxrpc_propose_ping(call, summary.ack_serial, rxrpc_propose_ack_ping_for_lost_reply); rxrpc_congestion_management(call, &summary); if (summary.need_retransmit) - rxrpc_resend(call, ack_serial, summary.ack_reason == RXRPC_ACK_PING_RESPONSE); + rxrpc_resend(call, summary.ack_serial, + summary.ack_reason == RXRPC_ACK_PING_RESPONSE); send_response: if (summary.ack_reason == RXRPC_ACK_PING) - rxrpc_send_ACK(call, RXRPC_ACK_PING_RESPONSE, ack_serial, + rxrpc_send_ACK(call, RXRPC_ACK_PING_RESPONSE, summary.ack_serial, rxrpc_propose_ack_respond_to_ping); else if (sp->hdr.flags & RXRPC_REQUEST_ACK) - rxrpc_send_ACK(call, RXRPC_ACK_REQUESTED, ack_serial, + rxrpc_send_ACK(call, RXRPC_ACK_REQUESTED, summary.ack_serial, rxrpc_propose_ack_respond_to_ack); } @@ -1165,7 +1162,7 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb) { struct rxrpc_ack_summary summary = { 0 }; - if (rxrpc_rotate_tx_window(call, call->tx_top, &summary, 0)) + if (rxrpc_rotate_tx_window(call, call->tx_top, &summary)) rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall); } From patchwork Wed Dec 4 07:47:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893354 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E98420ADD8 for ; Wed, 4 Dec 2024 07:49:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298590; cv=none; b=cLOLpJxUrj+h4MQyR2S4s6NKhnBlPyRgvIrLc4QY82YvqDzcZr/Ct/cdu9l98vMzfpLz2QC2NQvzwU0zMKv6g/V99TJqhIjdNAtoVjhc/gJ0ctHqXSoc+X7ho/29AHajhyJRquJxu7tUTe+1kCofMp37GbHEmNGfrDrm/Qi4NRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298590; c=relaxed/simple; bh=0st39ZJjiRr2gJGY0d4FUuDDQMm/hLfVOeh2eNsKMks=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z5JiS6/3nS2+jjTm8GQsBdm9lwI0kWFmi9oR23FG+CtyQoOEtQ0SvgFDJf7+PnL8q+ePayhqJ+tQA9gJm/1qKlY8mftUnSTH0jU+9eiV7HwlvtkCb+Rwc+Nz+vzmyRUblBEd2cZ+KD4El+mNzD12H57eygU/Xw8dHj3Jg98Grp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=h419bLHf; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="h419bLHf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x1vsaLmTmQOIoLbbZr+ZlItl7LOpSaLZQX6IDVarDmo=; b=h419bLHfeWLPBqcVluyfbcXsmovk1v6T5PH2X98K45ZyGTO47p91b7qbWfalzUMa2iC72u FCQ4CCFWjObS0pgzX4g3VYYlNDi3Us46p0ThSTjAYo+u/A89kAFFOHC6cc+ADOGRhvz38e mUHSSG5o3RpaUKMl0lo4u7rEHBjGuBA= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-262-2y1fu27dNDqR-AQUiRUlXg-1; Wed, 04 Dec 2024 02:49:43 -0500 X-MC-Unique: 2y1fu27dNDqR-AQUiRUlXg-1 X-Mimecast-MFC-AGG-ID: 2y1fu27dNDqR-AQUiRUlXg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 55DF61956046; Wed, 4 Dec 2024 07:49:42 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B8FBC1956094; Wed, 4 Dec 2024 07:49:39 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 35/39] rxrpc: Add a reason indicator to the tx_data tracepoint Date: Wed, 4 Dec 2024 07:47:03 +0000 Message-ID: <20241204074710.990092-36-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Add an indicator to the rxrpc_tx_data tracepoint to indicate what triggered the transmission of a particular packet. At this point, it's only normal transmission and retransmission, plus the tracepoint is also used to record loss injection, but in a future patch, TLP-induced (re-)transmission will also be a thing. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 21 ++++++++++++++------- net/rxrpc/ar-internal.h | 1 + net/rxrpc/call_event.c | 12 ++++++++---- net/rxrpc/output.c | 6 +++--- 4 files changed, 26 insertions(+), 14 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 326a4c257aea..d79623fff746 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -302,6 +302,11 @@ EM(rxrpc_txqueue_rotate_last, "RLS") \ E_(rxrpc_txqueue_wait, "WAI") +#define rxrpc_txdata_traces \ + EM(rxrpc_txdata_inject_loss, " *INJ-LOSS*") \ + EM(rxrpc_txdata_new_data, " ") \ + E_(rxrpc_txdata_retransmit, " *RETRANS*") + #define rxrpc_receive_traces \ EM(rxrpc_receive_end, "END") \ EM(rxrpc_receive_front, "FRN") \ @@ -534,6 +539,7 @@ enum rxrpc_timer_trace { rxrpc_timer_traces } __mode(byte); enum rxrpc_tq_trace { rxrpc_tq_traces } __mode(byte); enum rxrpc_tx_point { rxrpc_tx_points } __mode(byte); enum rxrpc_txbuf_trace { rxrpc_txbuf_traces } __mode(byte); +enum rxrpc_txdata_trace { rxrpc_txdata_traces } __mode(byte); enum rxrpc_txqueue_trace { rxrpc_txqueue_traces } __mode(byte); #endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */ @@ -572,6 +578,7 @@ rxrpc_timer_traces; rxrpc_tq_traces; rxrpc_tx_points; rxrpc_txbuf_traces; +rxrpc_txdata_traces; rxrpc_txqueue_traces; /* @@ -1222,9 +1229,10 @@ TRACE_EVENT(rxrpc_tx_packet, TRACE_EVENT(rxrpc_tx_data, TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, - rxrpc_serial_t serial, unsigned int flags, bool lose), + rxrpc_serial_t serial, unsigned int flags, + enum rxrpc_txdata_trace trace), - TP_ARGS(call, seq, serial, flags, lose), + TP_ARGS(call, seq, serial, flags, trace), TP_STRUCT__entry( __field(unsigned int, call) @@ -1233,7 +1241,7 @@ TRACE_EVENT(rxrpc_tx_data, __field(u32, cid) __field(u32, call_id) __field(u16, flags) - __field(bool, lose) + __field(enum rxrpc_txdata_trace, trace) ), TP_fast_assign( @@ -1243,18 +1251,17 @@ TRACE_EVENT(rxrpc_tx_data, __entry->seq = seq; __entry->serial = serial; __entry->flags = flags; - __entry->lose = lose; + __entry->trace = trace; ), - TP_printk("c=%08x DATA %08x:%08x %08x q=%08x fl=%02x%s%s", + TP_printk("c=%08x DATA %08x:%08x %08x q=%08x fl=%02x%s", __entry->call, __entry->cid, __entry->call_id, __entry->serial, __entry->seq, __entry->flags & RXRPC_TXBUF_WIRE_FLAGS, - __entry->flags & RXRPC_TXBUF_RESENT ? " *RETRANS*" : "", - __entry->lose ? " *LOSE*" : "") + __print_symbolic(__entry->trace, rxrpc_txdata_traces)) ); TRACE_EVENT(rxrpc_tx_ack, diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index aa240b4b4bec..139575032ae2 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -883,6 +883,7 @@ struct rxrpc_send_data_req { rxrpc_seq_t seq; /* Sequence of first data */ int n; /* Number of DATA packets to glue into jumbo */ bool did_send; /* T if did actually send */ + int /* enum rxrpc_txdata_trace */ trace; }; #include diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 39772459426b..99d9502564cc 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -101,6 +101,7 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ { struct rxrpc_send_data_req req = { .now = ktime_get_real(), + .trace = rxrpc_txdata_retransmit, }; struct rxrpc_txqueue *tq = call->tx_queue; ktime_t lowest_xmit_ts = KTIME_MAX; @@ -269,7 +270,8 @@ static unsigned int rxrpc_tx_window_space(struct rxrpc_call *call) /* * Transmit some as-yet untransmitted data. */ -static void rxrpc_transmit_fresh_data(struct rxrpc_call *call) +static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, + enum rxrpc_txdata_trace trace) { int space = rxrpc_tx_window_space(call); @@ -284,6 +286,7 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call) .now = ktime_get_real(), .seq = call->tx_transmitted + 1, .n = 0, + .trace = trace, }; struct rxrpc_txqueue *tq; struct rxrpc_txbuf *txb; @@ -332,7 +335,8 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call) } } -static void rxrpc_transmit_some_data(struct rxrpc_call *call) +static void rxrpc_transmit_some_data(struct rxrpc_call *call, + enum rxrpc_txdata_trace trace) { switch (__rxrpc_call_state(call)) { case RXRPC_CALL_SERVER_ACK_REQUEST: @@ -349,7 +353,7 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call) rxrpc_inc_stat(call->rxnet, stat_tx_data_underflow); return; } - rxrpc_transmit_fresh_data(call); + rxrpc_transmit_fresh_data(call, trace); break; default: return; @@ -463,7 +467,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) resend = true; } - rxrpc_transmit_some_data(call); + rxrpc_transmit_some_data(call, rxrpc_txdata_new_data); now = ktime_get_real(); t = ktime_sub(call->keepalive_at, now); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index a7de8a02f419..2633f955d1d0 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -511,7 +511,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, len += sizeof(*jumbo); } - trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags | flags, false); + trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, req->trace); kv->iov_len = len; return len; } @@ -655,8 +655,8 @@ void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req if ((lose++ & 7) == 7) { ret = 0; - trace_rxrpc_tx_data(call, txb->seq, txb->serial, - txb->flags, true); + trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags, + rxrpc_txdata_inject_loss); conn->peer->last_tx_at = ktime_get_seconds(); goto done; } From patchwork Wed Dec 4 07:47:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893355 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 801C120B213 for ; Wed, 4 Dec 2024 07:49:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298593; cv=none; b=G2gamlY61GEAhKHLR0QgGLPIda4grqsqlu/KIl4/b9o7BBqkUczdbYC6Z2S84MoZW8wWdMI+onLU9bcWBQPPKlrWXdRQv5JC0lfmmRpT2MwiEgMOp2fvi2c5l1bvFLlyLZOgPcFPK9tjOXFp6QFBUs33H/9zVY7YLJsZyjt1wSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298593; c=relaxed/simple; bh=0CcEAqz/EO5Vhw8Tfrit99LgIHPDA+AnChQbJeKEyQU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M+2gFQBRqN5ua5l8cugdkslwytdeci8xYu17Idr9Qfs62Ao576tloWu9AEHJjOM+HEtyoIBfCyAUI/JA31AkUq0IYV0bi5dPCnYNs0H9LE77zqEW+idBDrxfr6tqnpZRoZE7MGy87MCwMjFcE4a7gO0Z7VL6IKwp0hHTEI7JMDo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OzzB2970; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OzzB2970" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mkdsd0sEqLjcK23QmpRM/2ZCbn0R9dBpbo9de3wZqqw=; b=OzzB2970xIXaoGdoyfaPh4eJLP3XxzkDELN6gPrNzi4tfenAevUu3jCUql9r38D8WjFOTr gtfyeZCVl96xhgv0RNxXVCs5YdP5swZkfvVwl61k9RRIqkbBNFX/9Xu6Naiee/PzrULWso fQtLNwFVW28tfveYRolU4jfDl6VNIL8= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-1-8_1nkSUhPb2m4t-I8uDfkA-1; Wed, 04 Dec 2024 02:49:47 -0500 X-MC-Unique: 8_1nkSUhPb2m4t-I8uDfkA-1 X-Mimecast-MFC-AGG-ID: 8_1nkSUhPb2m4t-I8uDfkA Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3A50E1955D4E; Wed, 4 Dec 2024 07:49:46 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AB6411956089; Wed, 4 Dec 2024 07:49:43 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 36/39] rxrpc: Add a reason indicator to the tx_ack tracepoint Date: Wed, 4 Dec 2024 07:47:04 +0000 Message-ID: <20241204074710.990092-37-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Patchwork-Delegate: kuba@kernel.org Record the reason for the transmission of an ACK in the rxrpc_tx_ack tracepoint, and not just in the rxrpc_propose_ack tracepoint. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 13 +++++++++---- net/rxrpc/conn_event.c | 3 ++- net/rxrpc/output.c | 2 +- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index d79623fff746..0cfc8e1baf1f 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -375,6 +375,7 @@ EM(rxrpc_propose_ack_processing_op, "ProcOp ") \ EM(rxrpc_propose_ack_respond_to_ack, "Rsp2Ack") \ EM(rxrpc_propose_ack_respond_to_ping, "Rsp2Png") \ + EM(rxrpc_propose_ack_retransmit, "Retrans") \ EM(rxrpc_propose_ack_retry_tx, "RetryTx") \ EM(rxrpc_propose_ack_rotate_rx, "RxAck ") \ EM(rxrpc_propose_ack_rx_idle, "RxIdle ") \ @@ -1267,9 +1268,10 @@ TRACE_EVENT(rxrpc_tx_data, TRACE_EVENT(rxrpc_tx_ack, TP_PROTO(unsigned int call, rxrpc_serial_t serial, rxrpc_seq_t ack_first, rxrpc_serial_t ack_serial, - u8 reason, u8 n_acks, u16 rwind), + u8 reason, u8 n_acks, u16 rwind, + enum rxrpc_propose_ack_trace trace), - TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks, rwind), + TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks, rwind, trace), TP_STRUCT__entry( __field(unsigned int, call) @@ -1279,6 +1281,7 @@ TRACE_EVENT(rxrpc_tx_ack, __field(u8, reason) __field(u8, n_acks) __field(u16, rwind) + __field(enum rxrpc_propose_ack_trace, trace) ), TP_fast_assign( @@ -1289,16 +1292,18 @@ TRACE_EVENT(rxrpc_tx_ack, __entry->reason = reason; __entry->n_acks = n_acks; __entry->rwind = rwind; + __entry->trace = trace; ), - TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u rw=%u", + TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u rw=%u %s", __entry->call, __entry->serial, __print_symbolic(__entry->reason, rxrpc_ack_names), __entry->ack_first, __entry->ack_serial, __entry->n_acks, - __entry->rwind) + __entry->rwind, + __print_symbolic(__entry->trace, rxrpc_propose_ack_traces)) ); TRACE_EVENT(rxrpc_receive, diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 6b29a294ee07..713e04394ceb 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -177,7 +177,8 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, trace_rxrpc_tx_ack(chan->call_debug_id, serial, ntohl(pkt.ack.firstPacket), ntohl(pkt.ack.serial), - pkt.ack.reason, 0, rxrpc_rx_window_size); + pkt.ack.reason, 0, rxrpc_rx_window_size, + rxrpc_propose_ack_retransmit); break; default: diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 2633f955d1d0..74c3ff55b482 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -267,7 +267,7 @@ static void rxrpc_send_ack_packet(struct rxrpc_call *call, int nr_kv, size_t len trace_rxrpc_tx_ack(call->debug_id, serial, ntohl(ack->firstPacket), ntohl(ack->serial), ack->reason, ack->nAcks, - ntohl(trailer->rwind)); + ntohl(trailer->rwind), why); rxrpc_inc_stat(call->rxnet, stat_tx_ack_send); From patchwork Wed Dec 4 07:47:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893356 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C53C20B81B for ; Wed, 4 Dec 2024 07:49:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298597; cv=none; b=GO6w5vRHf3gsx7eDdqrr8oKAvKIL50KLIkhQQgfTCg/JvYZlCbyucUvhR+ifBmDXNN4LsWT1eoTJ3UOxdElDgb+VMWswSIKc6sQXz4QjYo1484sTez0CZxd231WVtsNthblUWZ+OcogSmxy4ANpZD3sWFbgkxcQbFJ2rh2ZKHjo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298597; c=relaxed/simple; bh=K4SYdP3LJIDYZIiRmFeZjsAxURrZTABv1ABbyZY7Hak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H355y9SU9kVn0pOx6GA+A1qR7imd19baJxS2JW5UrB5X/qgxmpD63yP739FzUU8qx0dvGU6kU/cDi/7agzp/ICTWHvvMZXQCZJh4Hfy3WsP3gjiQx0ZxFz00swU8uHqn5vmqJoy7uvagKFtljezhNX/e6QfSIGZXqARrEk6Jgd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CPZ2z757; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CPZ2z757" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wbmR+n4o0/faOJbfXuFJ63vogRZ4lJGjrINxeNubyCk=; b=CPZ2z757PjKlD/cWIXWd34JUBeAUP5WN5Q5x1SrKb6gYDwcFdT2WwWIw5tV2WL0Hcjn5If 9hVK2phZDhRbvo/f81wPD3r7OLTb+XaI8axf1JqUW1nf8SNnvkbP4iXildndnTMCu/loHU yKd+wm1RcvZrxdMu5LVDC3xKcbUAf9g= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-145-0AuaFTmQM2yT1ns9b36YNw-1; Wed, 04 Dec 2024 02:49:51 -0500 X-MC-Unique: 0AuaFTmQM2yT1ns9b36YNw-1 X-Mimecast-MFC-AGG-ID: 0AuaFTmQM2yT1ns9b36YNw Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 31E961955D4E; Wed, 4 Dec 2024 07:49:50 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 891CC1956048; Wed, 4 Dec 2024 07:49:47 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 37/39] rxrpc: Manage RTT per-call rather than per-peer Date: Wed, 4 Dec 2024 07:47:05 +0000 Message-ID: <20241204074710.990092-38-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org Manage the determination of RTT on a per-call (ie. per-RPC op) basis rather than on a per-peer basis, averaging across all calls going to that peer. The problem is that the RTT measurements from the initial packets on a call may be off because the server may do some setting up (such as getting a lock on a file) before accepting the rest of the data in the RPC and, further, the RTT may be affected by server-side file operations, for instance if a large amount of data is being written or read. Note: When handling the FS.StoreData-type RPCs, for example, the server uses the userStatus field in the header of ACK packets as supplementary flow control to aid in managing this. AF_RXRPC does not yet support this, but it should be added. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 2 +- net/rxrpc/ar-internal.h | 39 +++++++++------- net/rxrpc/call_event.c | 18 +++---- net/rxrpc/call_object.c | 2 + net/rxrpc/input.c | 10 ++-- net/rxrpc/output.c | 14 +++--- net/rxrpc/peer_object.c | 9 +--- net/rxrpc/proc.c | 6 +-- net/rxrpc/rtt.c | 91 ++++++++++++++++++------------------ net/rxrpc/sendmsg.c | 2 +- 10 files changed, 97 insertions(+), 96 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 0cfc8e1baf1f..71df5c48a413 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -1453,7 +1453,7 @@ TRACE_EVENT(rxrpc_rtt_rx, __entry->rtt = rtt; __entry->srtt = srtt; __entry->rto = rto; - __entry->min_rtt = minmax_get(&call->peer->min_rtt) + __entry->min_rtt = minmax_get(&call->min_rtt) ), TP_printk("c=%08x [%d] %s sr=%08x rr=%08x rtt=%u srtt=%u rto=%u min=%u", diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 139575032ae2..a9d732ba6df0 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -366,20 +366,9 @@ struct rxrpc_peer { unsigned short hdrsize; /* header size (IP + UDP + RxRPC) */ unsigned short tx_seg_max; /* Maximum number of transmissable segments */ - /* calculated RTT cache */ -#define RXRPC_RTT_CACHE_SIZE 32 - spinlock_t rtt_input_lock; /* RTT lock for input routine */ - ktime_t rtt_last_req; /* Time of last RTT request */ - unsigned int rtt_count; /* Number of samples we've got */ - unsigned int rtt_taken; /* Number of samples taken (wrapping) */ - struct minmax min_rtt; /* Estimated minimum RTT */ - - u32 srtt_us; /* smoothed round trip time << 3 in usecs */ - u32 mdev_us; /* medium deviation */ - u32 mdev_max_us; /* maximal mdev for the last rtt period */ - u32 rttvar_us; /* smoothed mdev_max */ - u32 rto_us; /* Retransmission timeout in usec */ - u8 backoff; /* Backoff timeout (as shift) */ + /* Calculated RTT cache */ + unsigned int recent_srtt_us; + unsigned int recent_rto_us; u8 cong_ssthresh; /* Congestion slow-start threshold */ }; @@ -765,6 +754,18 @@ struct rxrpc_call { rxrpc_serial_t acks_highest_serial; /* Highest serial number ACK'd */ unsigned short acks_nr_sacks; /* Number of soft acks recorded */ unsigned short acks_nr_snacks; /* Number of soft nacks recorded */ + + /* Calculated RTT cache */ + ktime_t rtt_last_req; /* Time of last RTT request */ + unsigned int rtt_count; /* Number of samples we've got */ + unsigned int rtt_taken; /* Number of samples taken (wrapping) */ + struct minmax min_rtt; /* Estimated minimum RTT */ + u32 srtt_us; /* smoothed round trip time << 3 in usecs */ + u32 mdev_us; /* medium deviation */ + u32 mdev_max_us; /* maximal mdev for the last rtt period */ + u32 rttvar_us; /* smoothed mdev_max */ + u32 rto_us; /* Retransmission timeout in usec */ + u8 backoff; /* Backoff timeout (as shift) */ }; /* @@ -1287,10 +1288,12 @@ static inline int rxrpc_abort_eproto(struct rxrpc_call *call, /* * rtt.c */ -void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace, int, - rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t); -ktime_t rxrpc_get_rto_backoff(struct rxrpc_peer *peer, bool retrans); -void rxrpc_peer_init_rtt(struct rxrpc_peer *); +void rxrpc_call_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, + int rtt_slot, + rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, + ktime_t send_time, ktime_t resp_time); +ktime_t rxrpc_get_rto_backoff(struct rxrpc_call *call, bool retrans); +void rxrpc_call_init_rtt(struct rxrpc_call *call); /* * rxkad.c diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 99d9502564cc..7af275544251 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -44,8 +44,8 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *call, rxrpc_serial_t serial, trace_rxrpc_propose_ack(call, why, RXRPC_ACK_DELAY, serial); - if (call->peer->srtt_us) - delay = (call->peer->srtt_us >> 3) * NSEC_PER_USEC; + if (call->srtt_us) + delay = (call->srtt_us >> 3) * NSEC_PER_USEC; else delay = ms_to_ktime(READ_ONCE(rxrpc_soft_ack_delay)); ktime_add_ms(delay, call->tx_backoff); @@ -105,7 +105,7 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ }; struct rxrpc_txqueue *tq = call->tx_queue; ktime_t lowest_xmit_ts = KTIME_MAX; - ktime_t rto = rxrpc_get_rto_backoff(call->peer, false); + ktime_t rto = rxrpc_get_rto_backoff(call, false); bool unacked = false; _enter("{%d,%d}", call->tx_bottom, call->tx_top); @@ -195,7 +195,7 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ } while ((tq = tq->next)); if (lowest_xmit_ts < KTIME_MAX) { - ktime_t delay = rxrpc_get_rto_backoff(call->peer, req.did_send); + ktime_t delay = rxrpc_get_rto_backoff(call, req.did_send); ktime_t resend_at = ktime_add(lowest_xmit_ts, delay); _debug("delay %llu %lld", delay, ktime_sub(resend_at, req.now)); @@ -216,7 +216,7 @@ void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_ */ if (!req.did_send) { ktime_t next_ping = ktime_add_us(call->acks_latest_ts, - call->peer->srtt_us >> 3); + call->srtt_us >> 3); if (ktime_sub(next_ping, req.now) <= 0) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, @@ -366,8 +366,8 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call, */ static void rxrpc_send_initial_ping(struct rxrpc_call *call) { - if (call->peer->rtt_count < 3 || - ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), + if (call->rtt_count < 3 || + ktime_before(ktime_add_ms(call->rtt_last_req, 1000), ktime_get_real())) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_params); @@ -499,10 +499,10 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) rxrpc_propose_ack_rx_idle); if (call->ackr_nr_unacked > 2) { - if (call->peer->rtt_count < 3) + if (call->rtt_count < 3) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_rtt); - else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), + else if (ktime_before(ktime_add_ms(call->rtt_last_req, 1000), ktime_get_real())) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_old_rtt); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 75cd0b06e14c..fb4ee0d2e9e1 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -176,6 +176,8 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, call->cong_cwnd = RXRPC_MIN_CWND; call->cong_ssthresh = RXRPC_TX_MAX_WINDOW; + rxrpc_call_init_rtt(call); + call->rxnet = rxnet; call->rtt_avail = RXRPC_CALL_RTT_AVAIL_MASK; atomic_inc(&rxnet->nr_calls); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 036cf440b63b..9f308bd512e9 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -71,11 +71,11 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, /* We analyse the number of packets that get ACK'd per RTT * period and increase the window if we managed to fill it. */ - if (call->peer->rtt_count == 0) + if (call->rtt_count == 0) goto out; if (ktime_before(call->acks_latest_ts, ktime_add_us(call->cong_tstamp, - call->peer->srtt_us >> 3))) + call->srtt_us >> 3))) goto out_no_clear_ca; summary->change = rxrpc_cong_rtt_window_end; call->cong_tstamp = call->acks_latest_ts; @@ -179,7 +179,7 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY) return; - rtt = ns_to_ktime(call->peer->srtt_us * (1000 / 8)); + rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8)); now = ktime_get_real(); if (!ktime_before(ktime_add(call->tx_last_sent, rtt), now)) return; @@ -200,7 +200,7 @@ static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, struct rxrpc_txqueue *tq, int ix) { - rxrpc_peer_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, + rxrpc_call_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, summary->acked_serial, summary->ack_serial, ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]), call->acks_latest_ts); @@ -725,7 +725,7 @@ static void rxrpc_complete_rtt_probe(struct rxrpc_call *call, clear_bit(i + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); smp_mb(); /* Read data before setting avail bit */ set_bit(i, &call->rtt_avail); - rxrpc_peer_add_rtt(call, type, i, acked_serial, ack_serial, + rxrpc_call_add_rtt(call, type, i, acked_serial, ack_serial, sent_at, resp_time); matched = true; } diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 74c3ff55b482..ecaf3becee40 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -234,7 +234,7 @@ static int rxrpc_fill_out_ack(struct rxrpc_call *call, int nr_kv, u8 ack_reason, if (ack_reason == RXRPC_ACK_PING) rxrpc_begin_rtt_probe(call, *_ack_serial, now, rxrpc_rtt_tx_ping); if (whdr->flags & RXRPC_REQUEST_ACK) - call->peer->rtt_last_req = now; + call->rtt_last_req = now; rxrpc_set_keepalive(call, now); return nr_kv; } @@ -473,9 +473,9 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, why = rxrpc_reqack_slow_start; else if (call->tx_winsize <= 2) why = rxrpc_reqack_small_txwin; - else if (call->peer->rtt_count < 3 && txb->seq & 1) + else if (call->rtt_count < 3) why = rxrpc_reqack_more_rtt; - else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), ktime_get_real())) + else if (ktime_before(ktime_add_ms(call->rtt_last_req, 1000), ktime_get_real())) why = rxrpc_reqack_old_rtt; else if (!last && !after(READ_ONCE(call->send_top), txb->seq)) why = rxrpc_reqack_app_stall; @@ -487,7 +487,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, if (why != rxrpc_reqack_no_srv_last) { flags |= RXRPC_REQUEST_ACK; trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_data, -1, serial); - call->peer->rtt_last_req = req->now; + call->rtt_last_req = req->now; } dont_set_request_ack: @@ -576,8 +576,8 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se } /* Set timeouts */ - if (call->peer->rtt_count > 1) { - ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + if (call->rtt_count > 1) { + ktime_t delay = rxrpc_get_rto_backoff(call, false); call->ack_lost_at = ktime_add(req->now, delay); trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); @@ -590,7 +590,7 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_expect_rx); } if (call->resend_at == KTIME_MAX) { - ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); + ktime_t delay = rxrpc_get_rto_backoff(call, false); call->resend_at = ktime_add(req->now, delay); trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_resend); diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 27b34ed4d76a..e1c63129586b 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -235,12 +235,9 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp, peer->service_conns = RB_ROOT; seqlock_init(&peer->service_conn_lock); spin_lock_init(&peer->lock); - spin_lock_init(&peer->rtt_input_lock); seqcount_init(&peer->mtu_lock); peer->debug_id = atomic_inc_return(&rxrpc_debug_id); - - rxrpc_peer_init_rtt(peer); - + peer->recent_srtt_us = UINT_MAX; peer->cong_ssthresh = RXRPC_TX_MAX_WINDOW; trace_rxrpc_peer(peer->debug_id, 1, why); } @@ -283,8 +280,6 @@ static void rxrpc_init_peer(struct rxrpc_local *local, struct rxrpc_peer *peer, peer->max_data = peer->if_mtu - peer->hdrsize; rxrpc_assess_MTU_size(local, peer); - - peer->rtt_last_req = ktime_get_real(); } /* @@ -496,7 +491,7 @@ EXPORT_SYMBOL(rxrpc_kernel_get_call_peer); */ unsigned int rxrpc_kernel_get_srtt(const struct rxrpc_peer *peer) { - return peer->rtt_count > 0 ? peer->srtt_us >> 3 : UINT_MAX; + return READ_ONCE(peer->recent_srtt_us); } EXPORT_SYMBOL(rxrpc_kernel_get_srtt); diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 5f974ec13d69..d803562ca0ac 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -296,15 +296,15 @@ static int rxrpc_peer_seq_show(struct seq_file *seq, void *v) now = ktime_get_seconds(); seq_printf(seq, - "UDP %-47.47s %-47.47s %3u %4u %5u %6llus %8u %8u\n", + "UDP %-47.47s %-47.47s %3u %4u %5u %6llus %8d %8d\n", lbuff, rbuff, refcount_read(&peer->ref), peer->cong_ssthresh, peer->max_data, now - peer->last_tx_at, - peer->srtt_us >> 3, - peer->rto_us); + READ_ONCE(peer->recent_srtt_us), + READ_ONCE(peer->recent_rto_us)); return 0; } diff --git a/net/rxrpc/rtt.c b/net/rxrpc/rtt.c index aff75e168de8..7474f88d7b18 100644 --- a/net/rxrpc/rtt.c +++ b/net/rxrpc/rtt.c @@ -15,14 +15,14 @@ #define RXRPC_TIMEOUT_INIT ((unsigned int)(1 * USEC_PER_SEC)) /* RFC6298 2.1 initial RTO value */ #define rxrpc_jiffies32 ((u32)jiffies) /* As rxrpc_jiffies32 */ -static u32 rxrpc_rto_min_us(struct rxrpc_peer *peer) +static u32 rxrpc_rto_min_us(struct rxrpc_call *call) { return 200; } -static u32 __rxrpc_set_rto(const struct rxrpc_peer *peer) +static u32 __rxrpc_set_rto(const struct rxrpc_call *call) { - return (peer->srtt_us >> 3) + peer->rttvar_us; + return (call->srtt_us >> 3) + call->rttvar_us; } static u32 rxrpc_bound_rto(u32 rto) @@ -40,10 +40,10 @@ static u32 rxrpc_bound_rto(u32 rto) * To save cycles in the RFC 1323 implementation it was better to break * it up into three procedures. -- erics */ -static void rxrpc_rtt_estimator(struct rxrpc_peer *peer, long sample_rtt_us) +static void rxrpc_rtt_estimator(struct rxrpc_call *call, long sample_rtt_us) { long m = sample_rtt_us; /* RTT */ - u32 srtt = peer->srtt_us; + u32 srtt = call->srtt_us; /* The following amusing code comes from Jacobson's * article in SIGCOMM '88. Note that rtt and mdev @@ -66,7 +66,7 @@ static void rxrpc_rtt_estimator(struct rxrpc_peer *peer, long sample_rtt_us) srtt += m; /* rtt = 7/8 rtt + 1/8 new */ if (m < 0) { m = -m; /* m is now abs(error) */ - m -= (peer->mdev_us >> 2); /* similar update on mdev */ + m -= (call->mdev_us >> 2); /* similar update on mdev */ /* This is similar to one of Eifel findings. * Eifel blocks mdev updates when rtt decreases. * This solution is a bit different: we use finer gain @@ -78,31 +78,31 @@ static void rxrpc_rtt_estimator(struct rxrpc_peer *peer, long sample_rtt_us) if (m > 0) m >>= 3; } else { - m -= (peer->mdev_us >> 2); /* similar update on mdev */ + m -= (call->mdev_us >> 2); /* similar update on mdev */ } - peer->mdev_us += m; /* mdev = 3/4 mdev + 1/4 new */ - if (peer->mdev_us > peer->mdev_max_us) { - peer->mdev_max_us = peer->mdev_us; - if (peer->mdev_max_us > peer->rttvar_us) - peer->rttvar_us = peer->mdev_max_us; + call->mdev_us += m; /* mdev = 3/4 mdev + 1/4 new */ + if (call->mdev_us > call->mdev_max_us) { + call->mdev_max_us = call->mdev_us; + if (call->mdev_max_us > call->rttvar_us) + call->rttvar_us = call->mdev_max_us; } } else { /* no previous measure. */ srtt = m << 3; /* take the measured time to be rtt */ - peer->mdev_us = m << 1; /* make sure rto = 3*rtt */ - peer->rttvar_us = umax(peer->mdev_us, rxrpc_rto_min_us(peer)); - peer->mdev_max_us = peer->rttvar_us; + call->mdev_us = m << 1; /* make sure rto = 3*rtt */ + call->rttvar_us = umax(call->mdev_us, rxrpc_rto_min_us(call)); + call->mdev_max_us = call->rttvar_us; } - peer->srtt_us = umax(srtt, 1); + call->srtt_us = umax(srtt, 1); } /* * Calculate rto without backoff. This is the second half of Van Jacobson's * routine referred to above. */ -static void rxrpc_set_rto(struct rxrpc_peer *peer) +static void rxrpc_set_rto(struct rxrpc_call *call) { u32 rto; @@ -113,7 +113,7 @@ static void rxrpc_set_rto(struct rxrpc_peer *peer) * is invisible. Actually, Linux-2.4 also generates erratic * ACKs in some circumstances. */ - rto = __rxrpc_set_rto(peer); + rto = __rxrpc_set_rto(call); /* 2. Fixups made earlier cannot be right. * If we do not estimate RTO correctly without them, @@ -124,73 +124,73 @@ static void rxrpc_set_rto(struct rxrpc_peer *peer) /* NOTE: clamping at RXRPC_RTO_MIN is not required, current algo * guarantees that rto is higher. */ - peer->rto_us = rxrpc_bound_rto(rto); + call->rto_us = rxrpc_bound_rto(rto); } -static void rxrpc_update_rtt_min(struct rxrpc_peer *peer, ktime_t resp_time, long rtt_us) +static void rxrpc_update_rtt_min(struct rxrpc_call *call, ktime_t resp_time, long rtt_us) { /* Window size 5mins in approx usec (ipv4.sysctl_tcp_min_rtt_wlen) */ u32 wlen_us = 5ULL * NSEC_PER_SEC / 1024; - minmax_running_min(&peer->min_rtt, wlen_us, resp_time / 1024, + minmax_running_min(&call->min_rtt, wlen_us, resp_time / 1024, (u32)rtt_us ? : jiffies_to_usecs(1)); } -static void rxrpc_ack_update_rtt(struct rxrpc_peer *peer, ktime_t resp_time, long rtt_us) +static void rxrpc_ack_update_rtt(struct rxrpc_call *call, ktime_t resp_time, long rtt_us) { if (rtt_us < 0) return; /* Update RACK min RTT [RFC8985 6.1 Step 1]. */ - rxrpc_update_rtt_min(peer, resp_time, rtt_us); + rxrpc_update_rtt_min(call, resp_time, rtt_us); - rxrpc_rtt_estimator(peer, rtt_us); - rxrpc_set_rto(peer); + rxrpc_rtt_estimator(call, rtt_us); + rxrpc_set_rto(call); /* Only reset backoff on valid RTT measurement [RFC6298]. */ - peer->backoff = 0; + call->backoff = 0; } /* * Add RTT information to cache. This is called in softirq mode and has - * exclusive access to the peer RTT data. + * exclusive access to the call RTT data. */ -void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, +void rxrpc_call_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, int rtt_slot, rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, ktime_t send_time, ktime_t resp_time) { - struct rxrpc_peer *peer = call->peer; s64 rtt_us; rtt_us = ktime_to_us(ktime_sub(resp_time, send_time)); if (rtt_us < 0) return; - spin_lock(&peer->rtt_input_lock); - rxrpc_ack_update_rtt(peer, resp_time, rtt_us); - if (peer->rtt_count < 3) - peer->rtt_count++; - peer->rtt_taken++; - spin_unlock(&peer->rtt_input_lock); + rxrpc_ack_update_rtt(call, resp_time, rtt_us); + if (call->rtt_count < 3) + call->rtt_count++; + call->rtt_taken++; + + WRITE_ONCE(call->peer->recent_srtt_us, call->srtt_us / 8); + WRITE_ONCE(call->peer->recent_rto_us, call->rto_us); trace_rxrpc_rtt_rx(call, why, rtt_slot, send_serial, resp_serial, - rtt_us, peer->srtt_us, peer->rto_us); + rtt_us, call->srtt_us, call->rto_us); } /* * Get the retransmission timeout to set in nanoseconds, backing it off each * time we retransmit. */ -ktime_t rxrpc_get_rto_backoff(struct rxrpc_peer *peer, bool retrans) +ktime_t rxrpc_get_rto_backoff(struct rxrpc_call *call, bool retrans) { u64 timo_us; - u32 backoff = READ_ONCE(peer->backoff); + u32 backoff = READ_ONCE(call->backoff); - timo_us = peer->rto_us; + timo_us = call->rto_us; timo_us <<= backoff; if (retrans && timo_us * 2 <= RXRPC_RTO_MAX) - WRITE_ONCE(peer->backoff, backoff + 1); + WRITE_ONCE(call->backoff, backoff + 1); if (timo_us < 1) timo_us = 1; @@ -198,10 +198,11 @@ ktime_t rxrpc_get_rto_backoff(struct rxrpc_peer *peer, bool retrans) return ns_to_ktime(timo_us * NSEC_PER_USEC); } -void rxrpc_peer_init_rtt(struct rxrpc_peer *peer) +void rxrpc_call_init_rtt(struct rxrpc_call *call) { - peer->rto_us = RXRPC_TIMEOUT_INIT; - peer->mdev_us = RXRPC_TIMEOUT_INIT; - peer->backoff = 0; - //minmax_reset(&peer->rtt_min, rxrpc_jiffies32, ~0U); + call->rtt_last_req = KTIME_MIN; + call->rto_us = RXRPC_TIMEOUT_INIT; + call->mdev_us = RXRPC_TIMEOUT_INIT; + call->backoff = 0; + //minmax_reset(&call->rtt_min, rxrpc_jiffies32, ~0U); } diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index df501a7c92fa..c4c8b718cafa 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -134,7 +134,7 @@ static int rxrpc_wait_for_tx_window_waitall(struct rxrpc_sock *rx, rxrpc_seq_t tx_start, tx_win; signed long rtt, timeout; - rtt = READ_ONCE(call->peer->srtt_us) >> 3; + rtt = READ_ONCE(call->srtt_us) >> 3; rtt = usecs_to_jiffies(rtt) * 2; if (rtt < 2) rtt = 2; From patchwork Wed Dec 4 07:47:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893357 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8476C20C472 for ; Wed, 4 Dec 2024 07:50:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298603; cv=none; b=F031UaH0no1VvZRC9UqDnoDiyunF8Y3nqCAxG/wBg6dEJYHlAVsA6VukmYpHCqIsIhyypx52AjgoXEJdGix1tERVlkX4wtmEqgqSCMFfyYMKWGdEGbCD7osVcQ5Syzcq0rD0jKDfiN1+G3v+UDHkzVHChmUKH76sTJXeFULBxic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298603; c=relaxed/simple; bh=Adu58XXV9H5+J/t7envvVjBnaMNTj02DjDWWugq+ZwU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kcpPOraf77GBilZgVJX8Jubk3sksZndA51leAkGHQlOSWVGPUzgo2tjGCnPMutBbHHQauD3p6ZZG4OCYpd2jOUJXOekCYvG366yukrErb18ENxsCamxs4Ea+IgxL3ZsjPPncbidNiK2UXDYzLOGB3kjtfA3yZx1BYqJ3NuYBAx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NW54JPho; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NW54JPho" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U66dxHSg4iIqCI5/yRyoDqAI42q40uNW9nKzWmZRf5M=; b=NW54JPhoAjlyUKF7JtxUrS2rXOmuXeLyohe1L9nBC1D/VRY1TuVohTluq3kMCQc+0t5drN GXvxB66V+SPxoApf1q/mYOo398KbA5igcF6iehjMN0e5zH4e52Y3p1UaEikF1QLvyQiTuH B/LYVjixN4TrFVwamVEaYpNiLWvUEto= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-554-JbIjREoxN760gDkNz5qZUQ-1; Wed, 04 Dec 2024 02:49:55 -0500 X-MC-Unique: JbIjREoxN760gDkNz5qZUQ-1 X-Mimecast-MFC-AGG-ID: JbIjREoxN760gDkNz5qZUQ Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2BAAE19560B6; Wed, 4 Dec 2024 07:49:54 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A20CD1956048; Wed, 4 Dec 2024 07:49:51 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 38/39] rxrpc: Fix request for an ACK when cwnd is minimum Date: Wed, 4 Dec 2024 07:47:06 +0000 Message-ID: <20241204074710.990092-39-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Patchwork-Delegate: kuba@kernel.org rxrpc_prepare_data_subpacket() sets the REQUEST-ACK flag on the outgoing DATA packet under a number of circumstances, including, theoretically, when the cwnd is at minimum (or less). However, the minimum in this function is hard-coded as 2, but the actual minimum is RXRPC_MIN_CWND (which is currently 4) and so this never occurs. Without this, we will miss the request of some ACKs, potentially leading to a transmission stall until a timeout occurs on one side or the other that leads to an ACK being generated. Fix the function to use RXRPC_MIN_CWND rather than a hard-coded number. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- net/rxrpc/output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index ecaf3becee40..f934551a9b1c 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -469,7 +469,7 @@ static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, why = rxrpc_reqack_ack_lost; else if (txb->flags & RXRPC_TXBUF_RESENT) why = rxrpc_reqack_retrans; - else if (call->cong_ca_state == RXRPC_CA_SLOW_START && call->cong_cwnd <= 2) + else if (call->cong_ca_state == RXRPC_CA_SLOW_START && call->cong_cwnd <= RXRPC_MIN_CWND) why = rxrpc_reqack_slow_start; else if (call->tx_winsize <= 2) why = rxrpc_reqack_small_txwin; From patchwork Wed Dec 4 07:47:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13893358 X-Patchwork-Delegate: kuba@kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6903F20CCC3 for ; Wed, 4 Dec 2024 07:50:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298609; cv=none; b=miD9dN50kDS8On/IbyN44AbL5U9CPha0niEcnUopiIZfttPFAtPEAh/b6kAKz0KBwRK3Xjp8+WScgdcZaNgO3ZOUMwuDPj2TMEK0iBVM3XhQTORUHFBSxLoULKxCYiGOURW1ZugMRkQyBPcyrsFRLsN5RIoNj/DcCSLJkWHdYEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733298609; c=relaxed/simple; bh=hcWTjLPD7bF/9jbW5F35a+04O5P2gvnoi7Up+fFnQI8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xrlnd3EQpqwiCdpaOxWcBDx8MPQIVtUZ5Nh8ECi5l6byvNGnhTBYgC+CgtuEMVDK9R2pk+XKtsQZl6+XD3UurdOhz18++f+5Qk6/OEsGFGYpBIKxr5BYuDQR0h+FtYYFCb6uMRAo3gG+r9T4M+hF1cLfOVpDa2ER9L1D1gBKJ1c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=E6yr52dv; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="E6yr52dv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733298604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8sE8viHJT82V3GutCDJ/1rA4KrIAwXgZgn6d+BPYX7s=; b=E6yr52dvDTUrNh1HqhCUaxou4TKeqIXLt6l4JUPR/hf7qNpAZXoAaofYyOYNege5MnSW9g geZMH3XzEijW1gf2d5HLJSuDaDkv3x0HApt9PVvwqCPHOldwA4ISPWHJhDaIFXLeJIqLm6 /z3zV3iWIwfWCargcaqM574V3kFvQqc= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-564-ON1gsPHTMkCIi5IQSqiCGA-1; Wed, 04 Dec 2024 02:49:59 -0500 X-MC-Unique: ON1gsPHTMkCIi5IQSqiCGA-1 X-Mimecast-MFC-AGG-ID: ON1gsPHTMkCIi5IQSqiCGA Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 406631955E89; Wed, 4 Dec 2024 07:49:58 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.48]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7DC8519560A2; Wed, 4 Dec 2024 07:49:55 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Marc Dionne , Yunsheng Lin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v2 39/39] rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985] Date: Wed, 4 Dec 2024 07:47:07 +0000 Message-ID: <20241204074710.990092-40-dhowells@redhat.com> In-Reply-To: <20241204074710.990092-1-dhowells@redhat.com> References: <20241204074710.990092-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Patchwork-Delegate: kuba@kernel.org When an rxrpc call is in its transmission phase and is sending a lot of packets, stalls occasionally occur that cause severe performance degradation (eg. increasing the transmission time for a 256MiB payload from 0.7s to 2.5s over a 10G link). rxrpc already implements TCP-style congestion control [RFC5681] and this helps mitigate the effects, but occasionally we're missing a time event that deals with a missing ACK, leading to a stall until the RTO expires. Fix this by implementing RACK/TLP in rxrpc. Signed-off-by: David Howells cc: Marc Dionne cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org --- include/trace/events/rxrpc.h | 342 ++++++++++++++++++++++++++-- net/rxrpc/Makefile | 1 + net/rxrpc/ar-internal.h | 107 ++++++++- net/rxrpc/call_event.c | 247 +++++++-------------- net/rxrpc/call_object.c | 3 +- net/rxrpc/input.c | 117 ++++++---- net/rxrpc/input_rack.c | 418 +++++++++++++++++++++++++++++++++++ net/rxrpc/io_thread.c | 1 + net/rxrpc/output.c | 41 +++- 9 files changed, 1041 insertions(+), 236 deletions(-) create mode 100644 net/rxrpc/input_rack.c diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 71df5c48a413..2f119d18a061 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -305,7 +305,9 @@ #define rxrpc_txdata_traces \ EM(rxrpc_txdata_inject_loss, " *INJ-LOSS*") \ EM(rxrpc_txdata_new_data, " ") \ - E_(rxrpc_txdata_retransmit, " *RETRANS*") + EM(rxrpc_txdata_retransmit, " *RETRANS*") \ + EM(rxrpc_txdata_tlp_new_data, " *TLP-NEW*") \ + E_(rxrpc_txdata_tlp_retransmit, " *TLP-RETRANS*") #define rxrpc_receive_traces \ EM(rxrpc_receive_end, "END") \ @@ -353,11 +355,12 @@ EM(rxrpc_timer_trace_hard, "HardLimit") \ EM(rxrpc_timer_trace_idle, "IdleLimit") \ EM(rxrpc_timer_trace_keepalive, "KeepAlive") \ - EM(rxrpc_timer_trace_lost_ack, "LostAck ") \ EM(rxrpc_timer_trace_ping, "DelayPing") \ - EM(rxrpc_timer_trace_resend, "Resend ") \ - EM(rxrpc_timer_trace_resend_reset, "ResendRst") \ - E_(rxrpc_timer_trace_resend_tx, "ResendTx ") + EM(rxrpc_timer_trace_rack_off, "RACK-OFF ") \ + EM(rxrpc_timer_trace_rack_zwp, "RACK-ZWP ") \ + EM(rxrpc_timer_trace_rack_reo, "RACK-Reo ") \ + EM(rxrpc_timer_trace_rack_tlp_pto, "TLP-PTO ") \ + E_(rxrpc_timer_trace_rack_rto, "RTO ") #define rxrpc_propose_ack_traces \ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ @@ -478,9 +481,9 @@ EM(rxrpc_txbuf_put_rotated, "PUT ROTATED") \ EM(rxrpc_txbuf_put_send_aborted, "PUT SEND-X ") \ EM(rxrpc_txbuf_put_trans, "PUT TRANS ") \ + EM(rxrpc_txbuf_see_lost, "SEE LOST ") \ EM(rxrpc_txbuf_see_out_of_step, "OUT-OF-STEP") \ - EM(rxrpc_txbuf_see_send_more, "SEE SEND+ ") \ - E_(rxrpc_txbuf_see_unacked, "SEE UNACKED") + E_(rxrpc_txbuf_see_send_more, "SEE SEND+ ") #define rxrpc_tq_traces \ EM(rxrpc_tq_alloc, "ALLOC") \ @@ -505,6 +508,24 @@ EM(rxrpc_rotate_trace_sack, "soft-ack") \ E_(rxrpc_rotate_trace_snak, "soft-nack") +#define rxrpc_rack_timer_modes \ + EM(RXRPC_CALL_RACKTIMER_OFF, "---") \ + EM(RXRPC_CALL_RACKTIMER_RACK_REORDER, "REO") \ + EM(RXRPC_CALL_RACKTIMER_TLP_PTO, "TLP") \ + E_(RXRPC_CALL_RACKTIMER_RTO, "RTO") + +#define rxrpc_tlp_probe_traces \ + EM(rxrpc_tlp_probe_trace_busy, "busy") \ + EM(rxrpc_tlp_probe_trace_transmit_new, "transmit-new") \ + E_(rxrpc_tlp_probe_trace_retransmit, "retransmit") + +#define rxrpc_tlp_ack_traces \ + EM(rxrpc_tlp_ack_trace_acked, "acked") \ + EM(rxrpc_tlp_ack_trace_dup_acked, "dup-acked") \ + EM(rxrpc_tlp_ack_trace_hard_beyond, "hard-beyond") \ + EM(rxrpc_tlp_ack_trace_incomplete, "incomplete") \ + E_(rxrpc_tlp_ack_trace_new_data, "new-data") + /* * Generate enums for tracing information. */ @@ -537,6 +558,8 @@ enum rxrpc_rtt_tx_trace { rxrpc_rtt_tx_traces } __mode(byte); enum rxrpc_sack_trace { rxrpc_sack_traces } __mode(byte); enum rxrpc_skb_trace { rxrpc_skb_traces } __mode(byte); enum rxrpc_timer_trace { rxrpc_timer_traces } __mode(byte); +enum rxrpc_tlp_ack_trace { rxrpc_tlp_ack_traces } __mode(byte); +enum rxrpc_tlp_probe_trace { rxrpc_tlp_probe_traces } __mode(byte); enum rxrpc_tq_trace { rxrpc_tq_traces } __mode(byte); enum rxrpc_tx_point { rxrpc_tx_points } __mode(byte); enum rxrpc_txbuf_trace { rxrpc_txbuf_traces } __mode(byte); @@ -567,6 +590,7 @@ rxrpc_conn_traces; rxrpc_local_traces; rxrpc_pmtud_reduce_traces; rxrpc_propose_ack_traces; +rxrpc_rack_timer_modes; rxrpc_receive_traces; rxrpc_recvmsg_traces; rxrpc_req_ack_traces; @@ -576,6 +600,8 @@ rxrpc_rtt_tx_traces; rxrpc_sack_traces; rxrpc_skb_traces; rxrpc_timer_traces; +rxrpc_tlp_ack_traces; +rxrpc_tlp_probe_traces; rxrpc_tq_traces; rxrpc_tx_points; rxrpc_txbuf_traces; @@ -618,6 +644,20 @@ TRACE_EVENT(rxrpc_local, __entry->usage) ); +TRACE_EVENT(rxrpc_iothread_rx, + TP_PROTO(struct rxrpc_local *local, unsigned int nr_rx), + TP_ARGS(local, nr_rx), + TP_STRUCT__entry( + __field(unsigned int, local) + __field(unsigned int, nr_rx) + ), + TP_fast_assign( + __entry->local = local->debug_id; + __entry->nr_rx = nr_rx; + ), + TP_printk("L=%08x nrx=%u", __entry->local, __entry->nr_rx) + ); + TRACE_EVENT(rxrpc_peer, TP_PROTO(unsigned int peer_debug_id, int ref, enum rxrpc_peer_trace why), @@ -1684,16 +1724,15 @@ TRACE_EVENT(rxrpc_drop_ack, TRACE_EVENT(rxrpc_retransmit, TP_PROTO(struct rxrpc_call *call, struct rxrpc_send_data_req *req, - struct rxrpc_txbuf *txb, ktime_t expiry), + struct rxrpc_txbuf *txb), - TP_ARGS(call, req, txb, expiry), + TP_ARGS(call, req, txb), TP_STRUCT__entry( __field(unsigned int, call) __field(unsigned int, qbase) __field(rxrpc_seq_t, seq) __field(rxrpc_serial_t, serial) - __field(ktime_t, expiry) ), TP_fast_assign( @@ -1701,15 +1740,13 @@ TRACE_EVENT(rxrpc_retransmit, __entry->qbase = req->tq->qbase; __entry->seq = req->seq; __entry->serial = txb->serial; - __entry->expiry = expiry; ), - TP_printk("c=%08x tq=%x q=%x r=%x xp=%lld", + TP_printk("c=%08x tq=%x q=%x r=%x", __entry->call, __entry->qbase, __entry->seq, - __entry->serial, - ktime_to_us(__entry->expiry)) + __entry->serial) ); TRACE_EVENT(rxrpc_congest, @@ -1767,9 +1804,9 @@ TRACE_EVENT(rxrpc_congest, ); TRACE_EVENT(rxrpc_reset_cwnd, - TP_PROTO(struct rxrpc_call *call, ktime_t now), + TP_PROTO(struct rxrpc_call *call, ktime_t since_last_tx, ktime_t rtt), - TP_ARGS(call, now), + TP_ARGS(call, since_last_tx, rtt), TP_STRUCT__entry( __field(unsigned int, call) @@ -1779,6 +1816,7 @@ TRACE_EVENT(rxrpc_reset_cwnd, __field(rxrpc_seq_t, hard_ack) __field(rxrpc_seq_t, prepared) __field(ktime_t, since_last_tx) + __field(ktime_t, rtt) __field(bool, has_data) ), @@ -1789,18 +1827,20 @@ TRACE_EVENT(rxrpc_reset_cwnd, __entry->extra = call->cong_extra; __entry->hard_ack = call->acks_hard_ack; __entry->prepared = call->send_top - call->tx_bottom; - __entry->since_last_tx = ktime_sub(now, call->tx_last_sent); + __entry->since_last_tx = since_last_tx; + __entry->rtt = rtt; __entry->has_data = call->tx_bottom != call->tx_top; ), - TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu d=%u", + TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu/%llu d=%u", __entry->call, __entry->hard_ack, __print_symbolic(__entry->ca_state, rxrpc_ca_states), __entry->cwnd, __entry->extra, __entry->prepared, - ktime_to_ns(__entry->since_last_tx), + ktime_to_us(__entry->since_last_tx), + ktime_to_us(__entry->rtt), __entry->has_data) ); @@ -1925,6 +1965,32 @@ TRACE_EVENT(rxrpc_resend, __entry->transmitted) ); +TRACE_EVENT(rxrpc_resend_lost, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, unsigned long lost), + + TP_ARGS(call, tq, lost), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, qbase) + __field(u8, nr_rep) + __field(unsigned long, lost) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->qbase = tq->qbase; + __entry->nr_rep = tq->nr_reported_acks; + __entry->lost = lost; + ), + + TP_printk("c=%08x tq=%x lost=%016lx nr=%u", + __entry->call, + __entry->qbase, + __entry->lost, + __entry->nr_rep) + ); + TRACE_EVENT(rxrpc_rotate, TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, struct rxrpc_ack_summary *summary, rxrpc_seq_t seq, @@ -2363,6 +2429,244 @@ TRACE_EVENT(rxrpc_pmtud_reduce, __entry->serial, __entry->max_data) ); +TRACE_EVENT(rxrpc_rack, + TP_PROTO(struct rxrpc_call *call, ktime_t timo), + + TP_ARGS(call, timo), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_serial_t, ack_serial) + __field(rxrpc_seq_t, seq) + __field(enum rxrpc_rack_timer_mode, mode) + __field(unsigned short, nr_sent) + __field(unsigned short, nr_lost) + __field(unsigned short, nr_resent) + __field(unsigned short, nr_sacked) + __field(ktime_t, timo) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->ack_serial = call->rx_serial; + __entry->seq = call->rack_end_seq; + __entry->mode = call->rack_timer_mode; + __entry->nr_sent = call->tx_nr_sent; + __entry->nr_lost = call->tx_nr_lost; + __entry->nr_resent = call->tx_nr_resent; + __entry->nr_sacked = call->acks_nr_sacks; + __entry->timo = timo; + ), + + TP_printk("c=%08x r=%08x q=%08x %s slrs=%u,%u,%u,%u t=%lld", + __entry->call, __entry->ack_serial, __entry->seq, + __print_symbolic(__entry->mode, rxrpc_rack_timer_modes), + __entry->nr_sent, __entry->nr_lost, + __entry->nr_resent, __entry->nr_sacked, + ktime_to_us(__entry->timo)) + ); + +TRACE_EVENT(rxrpc_rack_update, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary), + + TP_ARGS(call, summary), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_serial_t, ack_serial) + __field(rxrpc_seq_t, seq) + __field(int, xmit_ts) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->ack_serial = call->rx_serial; + __entry->seq = call->rack_end_seq; + __entry->xmit_ts = ktime_sub(call->acks_latest_ts, call->rack_xmit_ts); + ), + + TP_printk("c=%08x r=%08x q=%08x xt=%lld", + __entry->call, __entry->ack_serial, __entry->seq, + ktime_to_us(__entry->xmit_ts)) + ); + +TRACE_EVENT(rxrpc_rack_scan_loss, + TP_PROTO(struct rxrpc_call *call), + + TP_ARGS(call), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(ktime_t, rack_rtt) + __field(ktime_t, rack_reo_wnd) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->rack_rtt = call->rack_rtt; + __entry->rack_reo_wnd = call->rack_reo_wnd; + ), + + TP_printk("c=%08x rtt=%lld reow=%lld", + __entry->call, ktime_to_us(__entry->rack_rtt), + ktime_to_us(__entry->rack_reo_wnd)) + ); + +TRACE_EVENT(rxrpc_rack_scan_loss_tq, + TP_PROTO(struct rxrpc_call *call, const struct rxrpc_txqueue *tq, + unsigned long nacks), + + TP_ARGS(call, tq, nacks), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, qbase) + __field(unsigned long, nacks) + __field(unsigned long, lost) + __field(unsigned long, retrans) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->qbase = tq->qbase; + __entry->nacks = nacks; + __entry->lost = tq->segment_lost; + __entry->retrans = tq->segment_retransmitted; + ), + + TP_printk("c=%08x q=%08x n=%lx l=%lx r=%lx", + __entry->call, __entry->qbase, + __entry->nacks, __entry->lost, __entry->retrans) + ); + +TRACE_EVENT(rxrpc_rack_detect_loss, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, + rxrpc_seq_t seq), + + TP_ARGS(call, summary, seq), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_serial_t, ack_serial) + __field(rxrpc_seq_t, seq) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->ack_serial = call->rx_serial; + __entry->seq = seq; + ), + + TP_printk("c=%08x r=%08x q=%08x", + __entry->call, __entry->ack_serial, __entry->seq) + ); + +TRACE_EVENT(rxrpc_rack_mark_loss_tq, + TP_PROTO(struct rxrpc_call *call, const struct rxrpc_txqueue *tq), + + TP_ARGS(call, tq), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_seq_t, qbase) + __field(rxrpc_seq_t, trans) + __field(unsigned long, acked) + __field(unsigned long, lost) + __field(unsigned long, retrans) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->qbase = tq->qbase; + __entry->trans = call->tx_transmitted; + __entry->acked = tq->segment_acked; + __entry->lost = tq->segment_lost; + __entry->retrans = tq->segment_retransmitted; + ), + + TP_printk("c=%08x tq=%08x txq=%08x a=%lx l=%lx r=%lx", + __entry->call, __entry->qbase, __entry->trans, + __entry->acked, __entry->lost, __entry->retrans) + ); + +TRACE_EVENT(rxrpc_tlp_probe, + TP_PROTO(struct rxrpc_call *call, enum rxrpc_tlp_probe_trace trace), + + TP_ARGS(call, trace), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_serial_t, serial) + __field(rxrpc_seq_t, seq) + __field(enum rxrpc_tlp_probe_trace, trace) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->serial = call->tlp_serial; + __entry->seq = call->tlp_seq; + __entry->trace = trace; + ), + + TP_printk("c=%08x r=%08x pq=%08x %s", + __entry->call, __entry->serial, __entry->seq, + __print_symbolic(__entry->trace, rxrpc_tlp_probe_traces)) + ); + +TRACE_EVENT(rxrpc_tlp_ack, + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, + enum rxrpc_tlp_ack_trace trace), + + TP_ARGS(call, summary, trace), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(rxrpc_serial_t, serial) + __field(rxrpc_seq_t, tlp_seq) + __field(rxrpc_seq_t, hard_ack) + __field(enum rxrpc_tlp_ack_trace, trace) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->serial = call->tlp_serial; + __entry->tlp_seq = call->tlp_seq; + __entry->hard_ack = call->acks_hard_ack; + __entry->trace = trace; + ), + + TP_printk("c=%08x r=%08x pq=%08x hq=%08x %s", + __entry->call, __entry->serial, + __entry->tlp_seq, __entry->hard_ack, + __print_symbolic(__entry->trace, rxrpc_tlp_ack_traces)) + ); + +TRACE_EVENT(rxrpc_rack_timer, + TP_PROTO(struct rxrpc_call *call, ktime_t delay, bool exp), + + TP_ARGS(call, delay, exp), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(bool, exp) + __field(enum rxrpc_rack_timer_mode, mode) + __field(ktime_t, delay) + ), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->exp = exp; + __entry->mode = call->rack_timer_mode; + __entry->delay = delay; + ), + + TP_printk("c=%08x %s %s to=%lld", + __entry->call, + __entry->exp ? "Exp" : "Set", + __print_symbolic(__entry->mode, rxrpc_rack_timer_modes), + ktime_to_us(__entry->delay)) + ); + #undef EM #undef E_ diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile index ac5caf5a48e1..210b75e3179e 100644 --- a/net/rxrpc/Makefile +++ b/net/rxrpc/Makefile @@ -16,6 +16,7 @@ rxrpc-y := \ conn_object.o \ conn_service.o \ input.o \ + input_rack.o \ insecure.o \ io_thread.o \ key.o \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a9d732ba6df0..0c0a3c89dba3 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -621,6 +621,18 @@ enum rxrpc_ca_state { NR__RXRPC_CA_STATES } __mode(byte); +/* + * Current purpose of call RACK timer. According to the RACK-TLP protocol + * [RFC8985], the transmission timer (call->rack_timo_at) may only be used for + * one of these at once. + */ +enum rxrpc_rack_timer_mode { + RXRPC_CALL_RACKTIMER_OFF, /* Timer not running */ + RXRPC_CALL_RACKTIMER_RACK_REORDER, /* RACK reordering timer */ + RXRPC_CALL_RACKTIMER_TLP_PTO, /* TLP timeout */ + RXRPC_CALL_RACKTIMER_RTO, /* Retransmission timeout */ +} __mode(byte); + /* * RxRPC call definition * - matched by { connection, call_id } @@ -638,8 +650,7 @@ struct rxrpc_call { struct mutex user_mutex; /* User access mutex */ struct sockaddr_rxrpc dest_srx; /* Destination address */ ktime_t delay_ack_at; /* When DELAY ACK needs to happen */ - ktime_t ack_lost_at; /* When ACK is figured as lost */ - ktime_t resend_at; /* When next resend needs to happen */ + ktime_t rack_timo_at; /* When ACK is figured as lost */ ktime_t ping_at; /* When next to send a ping */ ktime_t keepalive_at; /* When next to send a keepalive ping */ ktime_t expect_rx_by; /* When we expect to get a packet by */ @@ -695,8 +706,12 @@ struct rxrpc_call { rxrpc_seq_t tx_bottom; /* First packet in buffer */ rxrpc_seq_t tx_transmitted; /* Highest packet transmitted */ rxrpc_seq_t tx_top; /* Highest Tx slot allocated. */ + rxrpc_serial_t tx_last_serial; /* Serial of last DATA transmitted */ u16 tx_backoff; /* Delay to insert due to Tx failure (ms) */ - u8 tx_winsize; /* Maximum size of Tx window */ + u16 tx_nr_sent; /* Number of packets sent, but unacked */ + u16 tx_nr_lost; /* Number of packets marked lost */ + u16 tx_nr_resent; /* Number of packets resent, but unacked */ + u16 tx_winsize; /* Maximum size of Tx window */ #define RXRPC_TX_MAX_WINDOW 128 u8 tx_jumbo_max; /* Maximum subpkts peer will accept */ ktime_t tx_last_sent; /* Last time a transmission occurred */ @@ -725,6 +740,25 @@ struct rxrpc_call { u16 cong_cumul_acks; /* Cumulative ACK count */ ktime_t cong_tstamp; /* Last time cwnd was changed */ + /* RACK-TLP [RFC8985] state. */ + ktime_t rack_xmit_ts; /* Latest transmission timestamp */ + ktime_t rack_rtt; /* RTT of most recently ACK'd segment */ + ktime_t rack_rtt_ts; /* Timestamp of rack_rtt */ + ktime_t rack_reo_wnd; /* Reordering window */ + unsigned int rack_reo_wnd_mult; /* Multiplier applied to rack_reo_wnd */ + int rack_reo_wnd_persist; /* Num loss recoveries before reset reo_wnd */ + rxrpc_seq_t rack_fack; /* Highest sequence so far ACK'd */ + rxrpc_seq_t rack_end_seq; /* Highest sequence seen */ + rxrpc_seq_t rack_dsack_round; /* DSACK opt recv'd in latest roundtrip */ + bool rack_dsack_round_none; /* T if dsack_round is "None" */ + bool rack_reordering_seen; /* T if detected reordering event */ + enum rxrpc_rack_timer_mode rack_timer_mode; /* Current mode of RACK timer */ + bool tlp_is_retrans; /* T if unacked TLP retransmission */ + rxrpc_serial_t tlp_serial; /* Serial of TLP probe (or 0 if none in progress) */ + rxrpc_seq_t tlp_seq; /* Sequence of TLP probe */ + unsigned int tlp_rtt_taken; /* Last time RTT taken */ + ktime_t tlp_max_ack_delay; /* Sender budget for max delayed ACK interval */ + /* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ u16 ackr_sack_base; /* Starting slot in SACK table ring */ @@ -783,6 +817,9 @@ struct rxrpc_ack_summary { bool retrans_timeo:1; /* T if reTx due to timeout happened */ bool need_retransmit:1; /* T if we need transmission */ bool rtt_sample_avail:1; /* T if RTT sample available */ + bool in_fast_or_rto_recovery:1; + bool exiting_fast_or_rto_recovery:1; + bool tlp_probe_acked:1; /* T if the TLP probe seq was acked */ u8 /*enum rxrpc_congest_change*/ change; }; @@ -864,6 +901,7 @@ struct rxrpc_txqueue { unsigned long segment_lost; /* Bit-per-buf: Set if declared lost */ unsigned long segment_retransmitted; /* Bit-per-buf: Set if retransmitted */ unsigned long rtt_samples; /* Bit-per-buf: Set if available for RTT */ + unsigned long ever_retransmitted; /* Bit-per-buf: Set if ever retransmitted */ /* The arrays we want to pack into as few cache lines as possible. */ struct { @@ -883,7 +921,9 @@ struct rxrpc_send_data_req { struct rxrpc_txqueue *tq; /* Tx queue segment holding first DATA */ rxrpc_seq_t seq; /* Sequence of first data */ int n; /* Number of DATA packets to glue into jumbo */ + bool retrans; /* T if this is a retransmission */ bool did_send; /* T if did actually send */ + bool tlp_probe; /* T if this is a TLP probe */ int /* enum rxrpc_txdata_trace */ trace; }; @@ -943,8 +983,9 @@ void rxrpc_propose_ping(struct rxrpc_call *call, u32 serial, enum rxrpc_propose_ack_trace why); void rxrpc_propose_delay_ACK(struct rxrpc_call *, rxrpc_serial_t, enum rxrpc_propose_ack_trace); -void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_response); - +void rxrpc_resend_tlp(struct rxrpc_call *call); +void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit, + enum rxrpc_txdata_trace trace); bool rxrpc_input_call_event(struct rxrpc_call *call); /* @@ -1123,6 +1164,32 @@ void rxrpc_congestion_degrade(struct rxrpc_call *); void rxrpc_input_call_packet(struct rxrpc_call *, struct sk_buff *); void rxrpc_implicit_end_call(struct rxrpc_call *, struct sk_buff *); +/* + * input_rack.c + */ +void rxrpc_input_rack_one(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned int ix); +void rxrpc_input_rack(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned long new_acks); +void rxrpc_rack_detect_loss_and_arm_timer(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary); +ktime_t rxrpc_tlp_calc_pto(struct rxrpc_call *call, ktime_t now); +void rxrpc_tlp_send_probe(struct rxrpc_call *call); +void rxrpc_tlp_process_ack(struct rxrpc_call *call, struct rxrpc_ack_summary *summary); +void rxrpc_rack_timer_expired(struct rxrpc_call *call, ktime_t overran_by); + +/* Initialise TLP state [RFC8958 7.1]. */ +static inline void rxrpc_tlp_init(struct rxrpc_call *call) +{ + call->tlp_serial = 0; + call->tlp_seq = call->acks_hard_ack; + call->tlp_is_retrans = false; +} + /* * io_thread.c */ @@ -1402,6 +1469,11 @@ static inline u32 latest(u32 seq1, u32 seq2) return after(seq1, seq2) ? seq1 : seq2; } +static inline bool rxrpc_seq_in_txq(const struct rxrpc_txqueue *tq, rxrpc_seq_t seq) +{ + return (seq & (RXRPC_NR_TXQUEUE - 1)) == tq->qbase; +} + static inline void rxrpc_queue_rx_call_packet(struct rxrpc_call *call, struct sk_buff *skb) { rxrpc_get_skb(skb, rxrpc_skb_get_call_rx); @@ -1409,6 +1481,31 @@ static inline void rxrpc_queue_rx_call_packet(struct rxrpc_call *call, struct sk rxrpc_poke_call(call, rxrpc_call_poke_rx_packet); } +/* + * Calculate how much space there is for transmitting more DATA packets. + */ +static inline unsigned int rxrpc_tx_window_space(const struct rxrpc_call *call) +{ + int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); + int transmitted = call->tx_top - call->tx_bottom; + + return max(winsize - transmitted, 0); +} + +static inline unsigned int rxrpc_left_out(const struct rxrpc_call *call) +{ + return call->acks_nr_sacks + call->tx_nr_lost; +} + +/* + * Calculate the number of transmitted DATA packets assumed to be in flight + * [approx RFC6675]. + */ +static inline unsigned int rxrpc_tx_in_flight(const struct rxrpc_call *call) +{ + return call->tx_nr_sent - rxrpc_left_out(call) + call->tx_nr_resent; +} + /* * debug tracing */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 7af275544251..8e477f7f8850 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -54,35 +54,21 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *call, rxrpc_serial_t serial, trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_delayed_ack); } -/* - * Handle congestion being detected by the retransmit timeout. - */ -static void rxrpc_congestion_timeout(struct rxrpc_call *call) -{ - set_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags); -} - /* * Retransmit one or more packets. */ static bool rxrpc_retransmit_data(struct rxrpc_call *call, - struct rxrpc_send_data_req *req, - ktime_t rto, bool skip_too_young) + struct rxrpc_send_data_req *req) { struct rxrpc_txqueue *tq = req->tq; unsigned int ix = req->seq & RXRPC_TXQ_MASK; struct rxrpc_txbuf *txb = tq->bufs[ix]; - ktime_t xmit_ts, resend_at; _enter("%x,%x,%x,%x", tq->qbase, req->seq, ix, txb->debug_id); - xmit_ts = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); - resend_at = ktime_add(xmit_ts, rto); - trace_rxrpc_retransmit(call, req, txb, ktime_sub(resend_at, req->now)); - if (skip_too_young && ktime_after(resend_at, req->now)) - return false; + req->retrans = true; + trace_rxrpc_retransmit(call, req, txb); - __set_bit(ix, &tq->segment_retransmitted); txb->flags |= RXRPC_TXBUF_RESENT; rxrpc_send_data_packet(call, req); rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); @@ -97,133 +83,76 @@ static bool rxrpc_retransmit_data(struct rxrpc_call *call, /* * Perform retransmission of NAK'd and unack'd packets. */ -void rxrpc_resend(struct rxrpc_call *call, rxrpc_serial_t ack_serial, bool ping_response) +static void rxrpc_resend(struct rxrpc_call *call) { struct rxrpc_send_data_req req = { .now = ktime_get_real(), .trace = rxrpc_txdata_retransmit, }; - struct rxrpc_txqueue *tq = call->tx_queue; - ktime_t lowest_xmit_ts = KTIME_MAX; - ktime_t rto = rxrpc_get_rto_backoff(call, false); - bool unacked = false; + struct rxrpc_txqueue *tq; _enter("{%d,%d}", call->tx_bottom, call->tx_top); - if (call->tx_bottom == call->tx_top) { - call->resend_at = KTIME_MAX; - trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); - return; - } + trace_rxrpc_resend(call, call->acks_highest_serial); - trace_rxrpc_resend(call, ack_serial); - - /* Scan the transmission queue, looking for explicitly NAK'd packets. */ - do { - unsigned long naks = ~tq->segment_acked; - rxrpc_seq_t tq_top = tq->qbase + RXRPC_NR_TXQUEUE - 1; + /* Scan the transmission queue, looking for lost packets. */ + for (tq = call->tx_queue; tq; tq = tq->next) { + unsigned long lost = tq->segment_lost; if (after(tq->qbase, call->tx_transmitted)) break; - if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE) - naks &= (1UL << tq->nr_reported_acks) - 1; - _debug("retr %16lx %u c=%08x [%x]", tq->segment_acked, tq->nr_reported_acks, call->debug_id, tq->qbase); - _debug("nack %16lx", naks); + _debug("lost %16lx", lost); - while (naks) { - unsigned int ix = __ffs(naks); + trace_rxrpc_resend_lost(call, tq, lost); + while (lost) { + unsigned int ix = __ffs(lost); struct rxrpc_txbuf *txb = tq->bufs[ix]; - __clear_bit(ix, &naks); - if (after(txb->serial, call->acks_highest_serial)) - continue; /* Ack point not yet reached */ - - rxrpc_see_txbuf(txb, rxrpc_txbuf_see_unacked); + __clear_bit(ix, &lost); + rxrpc_see_txbuf(txb, rxrpc_txbuf_see_lost); req.tq = tq; req.seq = tq->qbase + ix; req.n = 1; - rxrpc_retransmit_data(call, &req, rto, false); - } - - /* Anything after the soft-ACK table up to and including - * ack.previousPacket will get ACK'd or NACK'd in due course, - * so don't worry about those here. We do, however, need to - * consider retransmitting anything beyond that point. - */ - if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE && - after(tq_top, call->acks_prev_seq)) { - rxrpc_seq_t start = latest(call->acks_prev_seq, - tq->qbase + tq->nr_reported_acks); - rxrpc_seq_t stop = earliest(tq_top, call->tx_transmitted); - - _debug("unrep %x-%x", start, stop); - for (rxrpc_seq_t seq = start; before_eq(seq, stop); seq++) { - rxrpc_serial_t serial = tq->segment_serial[seq & RXRPC_TXQ_MASK]; - - if (ping_response && - before(serial, call->acks_highest_serial)) - break; /* Wasn't accounted for by a more recent ping. */ - req.tq = tq; - req.seq = seq; - req.n = 1; - if (rxrpc_retransmit_data(call, &req, rto, true)) - unacked = true; - } + rxrpc_retransmit_data(call, &req); } - - /* Work out the next retransmission timeout. */ - if (ktime_before(tq->xmit_ts_base, lowest_xmit_ts)) { - unsigned int lowest_us = UINT_MAX; - - for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) - if (!test_bit(i, &tq->segment_acked) && - tq->segment_xmit_ts[i] < lowest_us) - lowest_us = tq->segment_xmit_ts[i]; - _debug("lowest[%x] %llx %u", tq->qbase, tq->xmit_ts_base, lowest_us); - - if (lowest_us != UINT_MAX) { - ktime_t lowest_ns = ktime_add_us(tq->xmit_ts_base, lowest_us); - - if (ktime_before(lowest_ns, lowest_xmit_ts)) - lowest_xmit_ts = lowest_ns; - } - } - } while ((tq = tq->next)); - - if (lowest_xmit_ts < KTIME_MAX) { - ktime_t delay = rxrpc_get_rto_backoff(call, req.did_send); - ktime_t resend_at = ktime_add(lowest_xmit_ts, delay); - - _debug("delay %llu %lld", delay, ktime_sub(resend_at, req.now)); - call->resend_at = resend_at; - trace_rxrpc_timer_set(call, ktime_sub(resend_at, req.now), - rxrpc_timer_trace_resend_reset); - } else { - call->resend_at = KTIME_MAX; - trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); } - if (unacked) - rxrpc_congestion_timeout(call); + rxrpc_get_rto_backoff(call, req.did_send); + _leave(""); +} - /* If there was nothing that needed retransmission then it's likely - * that an ACK got lost somewhere. Send a ping to find out instead of - * retransmitting data. - */ - if (!req.did_send) { - ktime_t next_ping = ktime_add_us(call->acks_latest_ts, - call->srtt_us >> 3); +/* + * Resend the highest-seq DATA packet so far transmitted for RACK-TLP [RFC8985 7.3]. + */ +void rxrpc_resend_tlp(struct rxrpc_call *call) +{ + struct rxrpc_send_data_req req = { + .now = ktime_get_real(), + .seq = call->tx_transmitted, + .n = 1, + .tlp_probe = true, + .trace = rxrpc_txdata_tlp_retransmit, + }; - if (ktime_sub(next_ping, req.now) <= 0) - rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, - rxrpc_propose_ack_ping_for_0_retrans); + /* There's a chance it'll be on the tail segment of the queue. */ + req.tq = READ_ONCE(call->tx_qtail); + if (req.tq && + before(call->tx_transmitted, req.tq->qbase + RXRPC_NR_TXQUEUE)) { + rxrpc_retransmit_data(call, &req); + return; } - _leave(""); + for (req.tq = call->tx_queue; req.tq; req.tq = req.tq->next) { + if (after_eq(call->tx_transmitted, req.tq->qbase) && + before(call->tx_transmitted, req.tq->qbase + RXRPC_NR_TXQUEUE)) { + rxrpc_retransmit_data(call, &req); + return; + } + } } /* @@ -259,18 +188,10 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call) } } -static unsigned int rxrpc_tx_window_space(struct rxrpc_call *call) -{ - int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); - int in_flight = call->tx_top - call->tx_bottom; - - return max(winsize - in_flight, 0); -} - /* - * Transmit some as-yet untransmitted data. + * Transmit some as-yet untransmitted data, to a maximum of the supplied limit. */ -static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, +static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, unsigned int limit, enum rxrpc_txdata_trace trace) { int space = rxrpc_tx_window_space(call); @@ -335,8 +256,8 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, } } -static void rxrpc_transmit_some_data(struct rxrpc_call *call, - enum rxrpc_txdata_trace trace) +void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit, + enum rxrpc_txdata_trace trace) { switch (__rxrpc_call_state(call)) { case RXRPC_CALL_SERVER_ACK_REQUEST: @@ -353,7 +274,7 @@ static void rxrpc_transmit_some_data(struct rxrpc_call *call, rxrpc_inc_stat(call->rxnet, stat_tx_data_underflow); return; } - rxrpc_transmit_fresh_data(call, trace); + rxrpc_transmit_fresh_data(call, limit, trace); break; default: return; @@ -380,7 +301,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) { struct sk_buff *skb; ktime_t now, t; - bool resend = false, did_receive = false, saw_ack = false; + bool did_receive = false, saw_ack = false; s32 abort_code; rxrpc_see_call(call, rxrpc_call_see_input); @@ -398,21 +319,33 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) goto out; } - while ((skb = __skb_dequeue(&call->rx_queue))) { - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + do { + skb = __skb_dequeue(&call->rx_queue); + if (skb) { + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + + if (__rxrpc_call_is_complete(call) || + skb->mark == RXRPC_SKB_MARK_ERROR) { + rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); + goto out; + } + + saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK; - if (__rxrpc_call_is_complete(call) || - skb->mark == RXRPC_SKB_MARK_ERROR) { + rxrpc_input_call_packet(call, skb); rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); - goto out; + did_receive = true; } - saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK; + t = ktime_sub(call->rack_timo_at, ktime_get_real()); + if (t <= 0) { + trace_rxrpc_timer_exp(call, t, + rxrpc_timer_trace_rack_off + call->rack_timer_mode); + call->rack_timo_at = KTIME_MAX; + rxrpc_rack_timer_expired(call, t); + } - rxrpc_input_call_packet(call, skb); - rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); - did_receive = true; - } + } while (!skb_queue_empty(&call->rx_queue)); /* If we see our async-event poke, check for timeout trippage. */ now = ktime_get_real(); @@ -445,13 +378,6 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) rxrpc_propose_ack_delayed_ack); } - t = ktime_sub(call->ack_lost_at, now); - if (t <= 0) { - trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_lost_ack); - call->ack_lost_at = KTIME_MAX; - set_bit(RXRPC_CALL_EV_ACK_LOST, &call->events); - } - t = ktime_sub(call->ping_at, now); if (t <= 0) { trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_ping); @@ -460,15 +386,6 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) rxrpc_propose_ack_ping_for_keepalive); } - t = ktime_sub(call->resend_at, now); - if (t <= 0) { - trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_resend); - call->resend_at = KTIME_MAX; - resend = true; - } - - rxrpc_transmit_some_data(call, rxrpc_txdata_new_data); - now = ktime_get_real(); t = ktime_sub(call->keepalive_at, now); if (t <= 0) { @@ -478,21 +395,30 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) rxrpc_propose_ack_ping_for_keepalive); } + if (test_and_clear_bit(RXRPC_CALL_EV_INITIAL_PING, &call->events)) + rxrpc_send_initial_ping(call); + + rxrpc_transmit_some_data(call, UINT_MAX, rxrpc_txdata_new_data); + if (saw_ack) rxrpc_congestion_degrade(call); - if (test_and_clear_bit(RXRPC_CALL_EV_INITIAL_PING, &call->events)) - rxrpc_send_initial_ping(call); + if (did_receive && + (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_SEND_REQUEST || + __rxrpc_call_state(call) == RXRPC_CALL_SERVER_SEND_REPLY)) { + t = ktime_sub(call->rack_timo_at, ktime_get_real()); + trace_rxrpc_rack(call, t); + } /* Process events */ if (test_and_clear_bit(RXRPC_CALL_EV_ACK_LOST, &call->events)) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_lost_ack); - if (resend && + if (call->tx_nr_lost > 0 && __rxrpc_call_state(call) != RXRPC_CALL_CLIENT_RECV_REPLY && !test_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags)) - rxrpc_resend(call, 0, false); + rxrpc_resend(call); if (test_and_clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags)) rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, @@ -520,8 +446,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) set(call->expect_req_by); set(call->expect_rx_by); set(call->delay_ack_at); - set(call->ack_lost_at); - set(call->resend_at); + set(call->rack_timo_at); set(call->keepalive_at); set(call->ping_at); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index fb4ee0d2e9e1..5a543c3f6fb0 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -160,8 +160,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, call->ackr_window = 1; call->ackr_wtop = 1; call->delay_ack_at = KTIME_MAX; - call->ack_lost_at = KTIME_MAX; - call->resend_at = KTIME_MAX; + call->rack_timo_at = KTIME_MAX; call->ping_at = KTIME_MAX; call->keepalive_at = KTIME_MAX; call->expect_rx_by = KTIME_MAX; diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 9f308bd512e9..4974b5accafa 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -27,13 +27,13 @@ static void rxrpc_proto_abort(struct rxrpc_call *call, rxrpc_seq_t seq, } /* - * Do TCP-style congestion management [RFC 5681]. + * Do TCP-style congestion management [RFC5681]. */ static void rxrpc_congestion_management(struct rxrpc_call *call, struct rxrpc_ack_summary *summary) { summary->change = rxrpc_cong_no_change; - summary->in_flight = (call->tx_top - call->tx_bottom) - call->acks_nr_sacks; + summary->in_flight = rxrpc_tx_in_flight(call); if (test_and_clear_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags)) { summary->retrans_timeo = true; @@ -106,9 +106,12 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, call->cong_extra = 0; call->cong_dup_acks = 0; summary->need_retransmit = true; + summary->in_fast_or_rto_recovery = true; goto out; case RXRPC_CA_FAST_RETRANSMIT: + rxrpc_tlp_init(call); + summary->in_fast_or_rto_recovery = true; if (!summary->new_low_snack) { if (summary->nr_new_sacks == 0) call->cong_cwnd += 1; @@ -121,8 +124,10 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, } else { summary->change = rxrpc_cong_progress; call->cong_cwnd = call->cong_ssthresh; - if (call->acks_nr_snacks == 0) + if (call->acks_nr_snacks == 0) { + summary->exiting_fast_or_rto_recovery = true; goto resume_normality; + } } goto out; @@ -171,7 +176,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, */ void rxrpc_congestion_degrade(struct rxrpc_call *call) { - ktime_t rtt, now; + ktime_t rtt, now, time_since; if (call->cong_ca_state != RXRPC_CA_SLOW_START && call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE) @@ -181,10 +186,11 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8)); now = ktime_get_real(); - if (!ktime_before(ktime_add(call->tx_last_sent, rtt), now)) + time_since = ktime_sub(now, call->tx_last_sent); + if (ktime_before(time_since, rtt)) return; - trace_rxrpc_reset_cwnd(call, now); + trace_rxrpc_reset_cwnd(call, time_since, rtt); rxrpc_inc_stat(call->rxnet, stat_tx_data_cwnd_reset); call->tx_last_sent = now; call->cong_ca_state = RXRPC_CA_SLOW_START; @@ -200,11 +206,11 @@ static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, struct rxrpc_txqueue *tq, int ix) { + ktime_t xmit_ts = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); + rxrpc_call_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, summary->acked_serial, summary->ack_serial, - ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]), - call->acks_latest_ts); - summary->rtt_sample_avail = false; + xmit_ts, call->acks_latest_ts); __clear_bit(ix, &tq->rtt_samples); /* Prevent repeat RTT sample */ } @@ -216,7 +222,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, { struct rxrpc_txqueue *tq = call->tx_queue; rxrpc_seq_t seq = call->tx_bottom + 1; - bool rot_last = false; + bool rot_last = false, trace = false; _enter("%x,%x", call->tx_bottom, to); @@ -250,14 +256,16 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, rot_last = true; } - if (summary->rtt_sample_avail && - summary->acked_serial == tq->segment_serial[ix] && + if (summary->acked_serial == tq->segment_serial[ix] && test_bit(ix, &tq->rtt_samples)) rxrpc_add_data_rtt_sample(call, summary, tq, ix); if (ix == tq->nr_reported_acks) { /* Packet directly hard ACK'd. */ tq->nr_reported_acks++; + rxrpc_input_rack_one(call, summary, tq, ix); + if (seq == call->tlp_seq) + summary->tlp_probe_acked = true; summary->nr_new_hacks++; __set_bit(ix, &tq->segment_acked); trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_hack); @@ -268,11 +276,21 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, } else { /* Soft NAK -> hard ACK. */ call->acks_nr_snacks--; + rxrpc_input_rack_one(call, summary, tq, ix); + if (seq == call->tlp_seq) + summary->tlp_probe_acked = true; summary->nr_new_hacks++; __set_bit(ix, &tq->segment_acked); trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_snak); } + call->tx_nr_sent--; + if (__test_and_clear_bit(ix, &tq->segment_lost)) + call->tx_nr_lost--; + if (__test_and_clear_bit(ix, &tq->segment_retransmitted)) + call->tx_nr_resent--; + __clear_bit(ix, &tq->ever_retransmitted); + rxrpc_put_txbuf(tq->bufs[ix], rxrpc_txbuf_put_rotated); tq->bufs[ix] = NULL; @@ -282,7 +300,10 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, rxrpc_txqueue_rotate)); seq++; + trace = true; if (!(seq & RXRPC_TXQ_MASK)) { + trace_rxrpc_rack_update(call, summary); + trace = false; prefetch(tq->next); if (tq != call->tx_qtail) { call->tx_qbase += RXRPC_NR_TXQUEUE; @@ -299,6 +320,9 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, } while (before_eq(seq, to)); + if (trace) + trace_rxrpc_rack_update(call, summary); + if (rot_last) { set_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags); if (tq) { @@ -325,8 +349,10 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, { ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags)); - call->resend_at = KTIME_MAX; - trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_OFF; + call->rack_timo_at = KTIME_MAX; + trace_rxrpc_rack_timer(call, 0, false); + trace_rxrpc_timer_can(call, rxrpc_timer_trace_rack_off + call->rack_timer_mode); switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: @@ -842,10 +868,13 @@ static void rxrpc_input_soft_ack_tq(struct rxrpc_call *call, rxrpc_seq_t seq, rxrpc_seq_t *lowest_nak) { - unsigned long old_reported, flipped, new_acks, a_to_n, n_to_a; + unsigned long old_reported = 0, flipped, new_acks = 0; + unsigned long a_to_n, n_to_a = 0; int new, a, n; - old_reported = ~0UL >> (RXRPC_NR_TXQUEUE - tq->nr_reported_acks); + if (tq->nr_reported_acks > 0) + old_reported = ~0UL >> (RXRPC_NR_TXQUEUE - tq->nr_reported_acks); + _enter("{%x,%lx,%d},%lx,%d,%x", tq->qbase, tq->segment_acked, tq->nr_reported_acks, extracted_acks, nr_reported, seq); @@ -898,6 +927,18 @@ static void rxrpc_input_soft_ack_tq(struct rxrpc_call *call, if (before(lowest, *lowest_nak)) *lowest_nak = lowest; } + + if (summary->acked_serial) + rxrpc_input_soft_rtt(call, summary, tq); + + new_acks |= n_to_a; + if (new_acks) + rxrpc_input_rack(call, summary, tq, new_acks); + + if (call->tlp_serial && + rxrpc_seq_in_txq(tq, call->tlp_seq) && + test_bit(call->tlp_seq - tq->qbase, &new_acks)) + summary->tlp_probe_acked = true; } /* @@ -940,8 +981,6 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, _debug("bound %16lx %u", extracted, nr); - if (summary->rtt_sample_avail) - rxrpc_input_soft_rtt(call, summary, tq); rxrpc_input_soft_ack_tq(call, summary, tq, extracted, RXRPC_NR_TXQUEUE, seq - RXRPC_NR_TXQUEUE, &lowest_nak); extracted = ~0UL; @@ -1063,7 +1102,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) /* Discard any out-of-order or duplicate ACKs (outside lock). */ if (!rxrpc_is_ack_valid(call, hard_ack, prev_pkt)) { trace_rxrpc_rx_discard_ack(call, summary.ack_serial, hard_ack, prev_pkt); - goto send_response; + goto send_response; /* Still respond if requested. */ } trailer.maxMTU = 0; @@ -1079,14 +1118,19 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) call->acks_hard_ack = hard_ack; call->acks_prev_seq = prev_pkt; - switch (summary.ack_reason) { - case RXRPC_ACK_PING: - break; - default: - if (summary.acked_serial && - after(summary.acked_serial, call->acks_highest_serial)) - call->acks_highest_serial = summary.acked_serial; - break; + if (summary.acked_serial) { + switch (summary.ack_reason) { + case RXRPC_ACK_PING_RESPONSE: + rxrpc_complete_rtt_probe(call, call->acks_latest_ts, + summary.acked_serial, summary.ack_serial, + rxrpc_rtt_rx_ping_response); + break; + default: + if (after(summary.acked_serial, call->acks_highest_serial)) + call->acks_highest_serial = summary.acked_serial; + summary.rtt_sample_avail = true; + break; + } } /* Parse rwind and mtu sizes if provided. */ @@ -1096,15 +1140,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (hard_ack + 1 == 0) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); - if (summary.acked_serial) { - if (summary.ack_reason == RXRPC_ACK_PING_RESPONSE) - rxrpc_complete_rtt_probe(call, call->acks_latest_ts, - summary.acked_serial, summary.ack_serial, - rxrpc_rtt_rx_ping_response); - else - summary.rtt_sample_avail = true; - } - /* Ignore ACKs unless we are or have just been transmitting. */ switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: @@ -1141,10 +1176,14 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_propose_ping(call, summary.ack_serial, rxrpc_propose_ack_ping_for_lost_reply); + /* Drive the congestion management algorithm first and then RACK-TLP as + * the latter depends on the state/change in state in the former. + */ rxrpc_congestion_management(call, &summary); - if (summary.need_retransmit) - rxrpc_resend(call, summary.ack_serial, - summary.ack_reason == RXRPC_ACK_PING_RESPONSE); + rxrpc_rack_detect_loss_and_arm_timer(call, &summary); + rxrpc_tlp_process_ack(call, &summary); + if (call->tlp_serial && after_eq(summary.acked_serial, call->tlp_serial)) + call->tlp_serial = 0; send_response: if (summary.ack_reason == RXRPC_ACK_PING) diff --git a/net/rxrpc/input_rack.c b/net/rxrpc/input_rack.c new file mode 100644 index 000000000000..13c371261e0a --- /dev/null +++ b/net/rxrpc/input_rack.c @@ -0,0 +1,418 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* RACK-TLP [RFC8958] Implementation + * + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include "ar-internal.h" + +static bool rxrpc_rack_sent_after(ktime_t t1, rxrpc_seq_t seq1, + ktime_t t2, rxrpc_seq_t seq2) +{ + if (ktime_after(t1, t2)) + return true; + return t1 == t2 && after(seq1, seq2); +} + +/* + * Mark a packet lost. + */ +static void rxrpc_rack_mark_lost(struct rxrpc_call *call, + struct rxrpc_txqueue *tq, unsigned int ix) +{ + if (__test_and_set_bit(ix, &tq->segment_lost)) { + if (__test_and_clear_bit(ix, &tq->segment_retransmitted)) + call->tx_nr_resent--; + } else { + call->tx_nr_lost++; + } + tq->segment_xmit_ts[ix] = UINT_MAX; +} + +/* + * Get the transmission time of a packet in the Tx queue. + */ +static ktime_t rxrpc_get_xmit_ts(const struct rxrpc_txqueue *tq, unsigned int ix) +{ + if (tq->segment_xmit_ts[ix] == UINT_MAX) + return KTIME_MAX; + return ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); +} + +/* + * Get a bitmask of nack bits for a queue segment and mask off any that aren't + * yet reported. + */ +static unsigned long rxrpc_tq_nacks(const struct rxrpc_txqueue *tq) +{ + unsigned long nacks = ~tq->segment_acked; + + if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE) + nacks &= (1UL << tq->nr_reported_acks) - 1; + return nacks; +} + +/* + * Update the RACK state for the most recently sent packet that has been + * delivered [RFC8958 6.2 Step 2]. + */ +static void rxrpc_rack_update(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned int ix) +{ + rxrpc_seq_t seq = tq->qbase + ix; + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); + ktime_t rtt = ktime_sub(call->acks_latest_ts, xmit_ts); + + if (__test_and_clear_bit(ix, &tq->segment_lost)) + call->tx_nr_lost--; + + if (test_bit(ix, &tq->segment_retransmitted)) { + /* Use Rx.serial instead of TCP.ACK.ts_option.echo_reply. */ + if (before(call->acks_highest_serial, tq->segment_serial[ix])) + return; + if (rtt < minmax_get(&call->min_rtt)) + return; + } + + /* The RACK algorithm requires the segment ACKs to be traversed in + * order of segment transmission - but the only thing this seems to + * matter for is that RACK.rtt is set to the rtt of the most recently + * transmitted segment. We should be able to achieve the same by only + * setting RACK.rtt if the xmit time is greater. + */ + if (ktime_after(xmit_ts, call->rack_rtt_ts)) { + call->rack_rtt = rtt; + call->rack_rtt_ts = xmit_ts; + } + + if (rxrpc_rack_sent_after(xmit_ts, seq, call->rack_xmit_ts, call->rack_end_seq)) { + call->rack_rtt = rtt; + call->rack_xmit_ts = xmit_ts; + call->rack_end_seq = seq; + } +} + +/* + * Detect data segment reordering [RFC8958 6.2 Step 3]. + */ +static void rxrpc_rack_detect_reordering(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned int ix) +{ + rxrpc_seq_t seq = tq->qbase + ix; + + /* Track the highest sequence number so far ACK'd. This is not + * necessarily the same as ack.firstPacket + ack.nAcks - 1 as the peer + * could put a NACK in the last SACK slot. + */ + if (after(seq, call->rack_fack)) + call->rack_fack = seq; + else if (before(seq, call->rack_fack) && + test_bit(ix, &tq->segment_retransmitted)) + call->rack_reordering_seen = true; +} + +void rxrpc_input_rack_one(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned int ix) +{ + rxrpc_rack_update(call, summary, tq, ix); + rxrpc_rack_detect_reordering(call, summary, tq, ix); +} + +void rxrpc_input_rack(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct rxrpc_txqueue *tq, + unsigned long new_acks) +{ + while (new_acks) { + unsigned int ix = __ffs(new_acks); + + __clear_bit(ix, &new_acks); + rxrpc_input_rack_one(call, summary, tq, ix); + } + + trace_rxrpc_rack_update(call, summary); +} + +/* + * Update the reordering window [RFC8958 6.2 Step 4]. Returns the updated + * duration of the reordering window. + * + * Note that the Rx protocol doesn't have a 'DSACK option' per se, but ACKs can + * be given a 'DUPLICATE' reason with the serial number referring to the + * duplicated DATA packet. Rx does not inform as to whether this was a + * reception of the same packet twice or of a retransmission of a packet we + * already received (though this could be determined by the transmitter based + * on the serial number). + */ +static ktime_t rxrpc_rack_update_reo_wnd(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary) +{ + rxrpc_seq_t snd_una = call->acks_lowest_nak; /* Lowest unack'd seq */ + rxrpc_seq_t snd_nxt = call->tx_transmitted + 1; /* Next seq to be sent */ + bool have_dsack_option = summary->ack_reason == RXRPC_ACK_DUPLICATE; + int dup_thresh = 3; + + /* DSACK-based reordering window adaptation */ + if (!call->rack_dsack_round_none && + after_eq(snd_una, call->rack_dsack_round)) + call->rack_dsack_round_none = true; + + /* Grow the reordering window per round that sees DSACK. Reset the + * window after 16 DSACK-free recoveries. + */ + if (call->rack_dsack_round_none && have_dsack_option) { + call->rack_dsack_round_none = false; + call->rack_dsack_round = snd_nxt; + call->rack_reo_wnd_mult++; + call->rack_reo_wnd_persist = 16; + } else if (summary->exiting_fast_or_rto_recovery) { + call->rack_reo_wnd_persist--; + if (call->rack_reo_wnd_persist <= 0) + call->rack_reo_wnd_mult = 1; + } + + if (!call->rack_reordering_seen) { + if (summary->in_fast_or_rto_recovery) + return 0; + if (call->acks_nr_sacks >= dup_thresh) + return 0; + } + + return us_to_ktime(umin(call->rack_reo_wnd_mult * minmax_get(&call->min_rtt) / 4, + call->srtt_us >> 3)); +} + +/* + * Detect losses [RFC8958 6.2 Step 5]. + */ +static ktime_t rxrpc_rack_detect_loss(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary) +{ + struct rxrpc_txqueue *tq; + ktime_t timeout = 0, lost_after, now = ktime_get_real(); + + call->rack_reo_wnd = rxrpc_rack_update_reo_wnd(call, summary); + lost_after = ktime_add(call->rack_rtt, call->rack_reo_wnd); + trace_rxrpc_rack_scan_loss(call); + + for (tq = call->tx_queue; tq; tq = tq->next) { + unsigned long nacks = rxrpc_tq_nacks(tq); + + if (after(tq->qbase, call->tx_transmitted)) + break; + trace_rxrpc_rack_scan_loss_tq(call, tq, nacks); + + /* Skip ones marked lost but not yet retransmitted */ + nacks &= ~tq->segment_lost | tq->segment_retransmitted; + + while (nacks) { + unsigned int ix = __ffs(nacks); + rxrpc_seq_t seq = tq->qbase + ix; + ktime_t remaining; + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); + + __clear_bit(ix, &nacks); + + if (rxrpc_rack_sent_after(call->rack_xmit_ts, call->rack_end_seq, + xmit_ts, seq)) { + remaining = ktime_sub(ktime_add(xmit_ts, lost_after), now); + if (remaining <= 0) { + rxrpc_rack_mark_lost(call, tq, ix); + trace_rxrpc_rack_detect_loss(call, summary, seq); + } else { + timeout = max(remaining, timeout); + } + } + } + } + + return timeout; +} + +/* + * Detect losses and set a timer to retry the detection [RFC8958 6.2 Step 5]. + */ +void rxrpc_rack_detect_loss_and_arm_timer(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary) +{ + ktime_t timeout = rxrpc_rack_detect_loss(call, summary); + + if (timeout) { + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_RACK_REORDER; + call->rack_timo_at = ktime_add(ktime_get_real(), timeout); + trace_rxrpc_rack_timer(call, timeout, false); + trace_rxrpc_timer_set(call, timeout, rxrpc_timer_trace_rack_reo); + } +} + +/* + * Handle RACK-TLP RTO expiration [RFC8958 6.3]. + */ +static void rxrpc_rack_mark_losses_on_rto(struct rxrpc_call *call) +{ + struct rxrpc_txqueue *tq; + rxrpc_seq_t snd_una = call->acks_lowest_nak; /* Lowest unack'd seq */ + ktime_t lost_after = ktime_add(call->rack_rtt, call->rack_reo_wnd); + ktime_t deadline = ktime_sub(ktime_get_real(), lost_after); + + for (tq = call->tx_queue; tq; tq = tq->next) { + unsigned long unacked = ~tq->segment_acked; + + trace_rxrpc_rack_mark_loss_tq(call, tq); + while (unacked) { + unsigned int ix = __ffs(unacked); + rxrpc_seq_t seq = tq->qbase + ix; + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); + + if (after(seq, call->tx_transmitted)) + return; + __clear_bit(ix, &unacked); + + if (seq == snd_una || + ktime_before(xmit_ts, deadline)) + rxrpc_rack_mark_lost(call, tq, ix); + } + } +} + +/* + * Calculate the TLP loss probe timeout (PTO) [RFC8958 7.2]. + */ +ktime_t rxrpc_tlp_calc_pto(struct rxrpc_call *call, ktime_t now) +{ + unsigned int flight_size = rxrpc_tx_in_flight(call); + ktime_t rto_at = ktime_add(call->tx_last_sent, + rxrpc_get_rto_backoff(call, false)); + ktime_t pto; + + if (call->rtt_count > 0) { + /* Use 2*SRTT as the timeout. */ + pto = ns_to_ktime(call->srtt_us * NSEC_PER_USEC / 4); + if (flight_size) + pto = ktime_add(pto, call->tlp_max_ack_delay); + } else { + pto = NSEC_PER_SEC; + } + + if (ktime_after(ktime_add(now, pto), rto_at)) + pto = ktime_sub(rto_at, now); + return pto; +} + +/* + * Send a TLP loss probe on PTO expiration [RFC8958 7.3]. + */ +void rxrpc_tlp_send_probe(struct rxrpc_call *call) +{ + unsigned int in_flight = rxrpc_tx_in_flight(call); + + if (after_eq(call->acks_hard_ack, call->tx_transmitted)) + return; /* Everything we transmitted has been acked. */ + + /* There must be no other loss probe still in flight and we need to + * have taken a new RTT sample since last probe or the start of + * connection. + */ + if (!call->tlp_serial && + call->tlp_rtt_taken != call->rtt_taken) { + call->tlp_is_retrans = false; + if (after(call->send_top, call->tx_transmitted) && + rxrpc_tx_window_space(call) > 0) { + /* Transmit the lowest-sequence unsent DATA */ + call->tx_last_serial = 0; + rxrpc_transmit_some_data(call, 1, rxrpc_txdata_tlp_new_data); + call->tlp_serial = call->tx_last_serial; + call->tlp_seq = call->tx_transmitted; + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_transmit_new); + in_flight = rxrpc_tx_in_flight(call); + } else { + /* Retransmit the highest-sequence DATA sent */ + call->tx_last_serial = 0; + rxrpc_resend_tlp(call); + call->tlp_is_retrans = true; + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_retransmit); + } + } else { + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_busy); + } + + if (in_flight != 0) { + ktime_t rto = rxrpc_get_rto_backoff(call, false); + + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_RTO; + call->rack_timo_at = ktime_add(ktime_get_real(), rto); + trace_rxrpc_rack_timer(call, rto, false); + trace_rxrpc_timer_set(call, rto, rxrpc_timer_trace_rack_rto); + } +} + +/* + * Detect losses using the ACK of a TLP loss probe [RFC8958 7.4]. + */ +void rxrpc_tlp_process_ack(struct rxrpc_call *call, struct rxrpc_ack_summary *summary) +{ + if (!call->tlp_serial || after(call->tlp_seq, call->acks_hard_ack)) + return; + + if (!call->tlp_is_retrans) { + /* TLP of new data delivered */ + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_new_data); + call->tlp_serial = 0; + } else if (summary->ack_reason == RXRPC_ACK_DUPLICATE && + summary->acked_serial == call->tlp_serial) { + /* General Case: Detected packet losses using RACK [7.4.1] */ + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_dup_acked); + call->tlp_serial = 0; + } else if (after(call->acks_hard_ack, call->tlp_seq)) { + /* Repaired the single loss */ + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_hard_beyond); + call->tlp_serial = 0; + // TODO: Invoke congestion control to react to the loss + // event the probe has repaired + } else if (summary->tlp_probe_acked) { + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_acked); + /* Special Case: Detected a single loss repaired by the loss + * probe [7.4.2] + */ + call->tlp_serial = 0; + } else { + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_incomplete); + } +} + +/* + * Handle RACK timer expiration; returns true to request a resend. + */ +void rxrpc_rack_timer_expired(struct rxrpc_call *call, ktime_t overran_by) +{ + struct rxrpc_ack_summary summary = {}; + enum rxrpc_rack_timer_mode mode = call->rack_timer_mode; + + trace_rxrpc_rack_timer(call, overran_by, true); + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_OFF; + + switch (mode) { + case RXRPC_CALL_RACKTIMER_RACK_REORDER: + rxrpc_rack_detect_loss_and_arm_timer(call, &summary); + break; + case RXRPC_CALL_RACKTIMER_TLP_PTO: + rxrpc_tlp_send_probe(call); + break; + case RXRPC_CALL_RACKTIMER_RTO: + // Might need to poke the congestion algo in some way + rxrpc_rack_mark_losses_on_rto(call); + break; + //case RXRPC_CALL_RACKTIMER_ZEROWIN: + default: + pr_warn("Unexpected rack timer %u", call->rack_timer_mode); + } +} diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index fbacf2056f64..2925c7fc82cf 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -470,6 +470,7 @@ int rxrpc_io_thread(void *data) spin_lock_irq(&local->rx_queue.lock); skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); spin_unlock_irq(&local->rx_queue.lock); + trace_rxrpc_iothread_rx(local, skb_queue_len(&rx_queue)); } /* Distribute packets and errors. */ diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index f934551a9b1c..6f7a125d6e90 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -542,12 +542,14 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se unsigned int xmit_ts; rxrpc_seq_t seq = req->seq; size_t len = 0; + bool start_tlp = false; trace_rxrpc_tq(call, tq, seq, rxrpc_tq_transmit); /* Each transmission of a Tx packet needs a new serial number */ serial = rxrpc_get_next_serials(call->conn, req->n); + call->tx_last_serial = serial + req->n - 1; call->tx_last_sent = req->now; xmit_ts = rxrpc_prepare_txqueue(tq, req); prefetch(tq->next); @@ -557,6 +559,18 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se struct rxrpc_txbuf *txb = tq->bufs[seq & RXRPC_TXQ_MASK]; _debug("prep[%u] tq=%x q=%x", i, tq->qbase, seq); + + /* Record (re-)transmission for RACK [RFC8985 6.1]. */ + if (__test_and_clear_bit(ix, &tq->segment_lost)) + call->tx_nr_lost--; + if (req->retrans) { + __set_bit(ix, &tq->ever_retransmitted); + __set_bit(ix, &tq->segment_retransmitted); + call->tx_nr_resent++; + } else { + call->tx_nr_sent++; + start_tlp = true; + } tq->segment_xmit_ts[ix] = xmit_ts; tq->segment_serial[ix] = serial; if (i + 1 == req->n) @@ -576,11 +590,24 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se } /* Set timeouts */ - if (call->rtt_count > 1) { - ktime_t delay = rxrpc_get_rto_backoff(call, false); + if (req->tlp_probe) { + /* Sending TLP loss probe [RFC8985 7.3]. */ + call->tlp_serial = serial - 1; + call->tlp_seq = seq - 1; + } else if (start_tlp) { + /* Schedule TLP loss probe [RFC8985 7.2]. */ + ktime_t pto; + + if (!test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) + /* The first packet may take longer to elicit a response. */ + pto = NSEC_PER_SEC; + else + pto = rxrpc_tlp_calc_pto(call, req->now); - call->ack_lost_at = ktime_add(req->now, delay); - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_TLP_PTO; + call->rack_timo_at = ktime_add(req->now, pto); + trace_rxrpc_rack_timer(call, pto, false); + trace_rxrpc_timer_set(call, pto, rxrpc_timer_trace_rack_tlp_pto); } if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) { @@ -589,12 +616,6 @@ static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_se call->expect_rx_by = ktime_add(req->now, delay); trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_expect_rx); } - if (call->resend_at == KTIME_MAX) { - ktime_t delay = rxrpc_get_rto_backoff(call, false); - - call->resend_at = ktime_add(req->now, delay); - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_resend); - } rxrpc_set_keepalive(call, req->now); return len;