From patchwork Tue Feb 18 18:36:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980498 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CC3517A2EC; Tue, 18 Feb 2025 18:36:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903792; cv=none; b=PG9S90cExD7WYHtJBb9DgTuioEX+PwcOQMmFbyMfCCcs4LjZhOuMCWKqqeGXwYASO2IMVHzw6CLQzOLMKs2E1T+Iuf2UELuZG+bdbsaZdopBk7JgIoqzRTpwJG4Nty8+zm5jjRrHFVsJ3PggOW6ypODc7Tv8zzVSpqWcDvFbkng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903792; c=relaxed/simple; bh=ACPJ+yPY2W1joc1pWuexOVqHoRklIHDbYoAvIJBUces=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=IivbUl05UoUliazvhm92tpDHtLxLU2kplD+Wef3fkA+yIlNMgjjUKOJsw5JOMmtVY9sjdZ1wFm+9qdfXRndi4vJuF4DUv1txNd0EHayvKGfLz2u+g7qsuv0UTlCuLj/av0SP7JNEgSBpfYnXlVTN+u+MbY+skTHPyajzO3SeWSg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vCVhDslj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vCVhDslj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA63DC4CEEC; Tue, 18 Feb 2025 18:36:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903790; bh=ACPJ+yPY2W1joc1pWuexOVqHoRklIHDbYoAvIJBUces=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=vCVhDsljVOLC+ZuCrbHzsCMseOuvPK6IM8frJ6YPpjw/ofh8Gyme3GvfSIbY62w+L Q/DMBlraTT0WxRBy6V1d3uJBAhEV3r8HnfFIjcKRbchcmIPyrnog7moRoxE2py5eEp KXNgDfHOrmWbw4Bdrw5Ow97ckMX5RVdoiwxbDT7mrlJ64CkbjSC/ywj3q8Fzd/odaw UR0qrwzX07WWlmyW12wElxeQLEKM3b5T3q1VZ1Hs56RLpJEYv60wKD2abTgkhdfDai D6v5x9rbpHhTLzJ8QuhafgiQCZlziqt/UD2fkVmKXovTaIPqnnE377U5nAuf12CREt Ym9hDOqMDXB6A== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:12 +0100 Subject: [PATCH net-next 1/7] mptcp: consolidate subflow cleanup Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-1-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=3128; i=matttbe@kernel.org; h=from:subject:message-id; bh=zRyPHzOQB2XFEMTBCqtrVF98bVbR+f2SqeWZUx2E3Mk=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMnQ625mFfmmXvXNxxfMGbI3LG3oiO7dwtQo VC68j5mG52JAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg cwC8D/43bwVvWQSnpOYMdB0IUUji3qmRDXfax0ap6JB5FwA6aPGzzySU8tynCEX2ZswNINX6ytJ vlAceGeapAN+SA6Of2kvo8vmlcBinvb1Unb50dTwjgx5fUY4/6rwYRAEXbanR2jOw87T7MHzIGq ozFcB/KKveDT3wM8Q1/tktDBf6SW7eyPvDsAciZpSe2keAN42A+s9rwi2wM5jiqn5BcrxGl85bw X9n2wKvoirZ0pYQWsa8f3O3VZeksUsPiFZK13dTtnFWbmCYrVtplUnPmFSJZQFf4sNHdtkZvP5t jQkf0D9PfKQ+DrofrCJpNQRDRtyYMc8SivCafViYBKSFzYAhR5UkJJWbCQbUCPEQw/fYZhtYAc7 hhfAOJi6VfLrYg3khVuFZg29mrCJrHW0hzDKE5FjMxaBN/RPpcnPUwwJybgbNsPdhvMXNNEcRK7 LeHnQGdXXR8OQterDWgACO2laWpUnGuoLXoXXwBnz+kX2RL/tRBj9Es5ImYrBzroBldhtDfPJWk rSxHqaDIGedrP2G/lptfQbarakYVXi+/ancBR5ULW/SS5MmB995Vz21664xEdtlp9XjmpLifThc J2u4bAb+hgiLLyNtputga+N9trgyN6qCmmBv7hF4PkJJP2uXvFpZYqwKK03xLstWOBdraruMrVq otSK2CtPyRpqmfA== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni Consolidate all the cleanup actions requiring the worker in a single helper and ensure the dummy data fin creation for fallback socket is performed only when the tcp rx queue is empty. There are no functional changes intended, but this will simplify the next patch, when the tcp rx queue spooling could be delayed at release_cb time. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/subflow.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index fd021cf8286eff9234b950a4d4c083ea7756eba3..2926bdf88e42c5f2db6875b00b4eca2dbf49dba2 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1271,7 +1271,12 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, subflow->map_valid = 0; } -/* sched mptcp worker to remove the subflow if no more data is pending */ +static bool subflow_is_done(const struct sock *sk) +{ + return sk->sk_shutdown & RCV_SHUTDOWN || sk->sk_state == TCP_CLOSE; +} + +/* sched mptcp worker for subflow cleanup if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk) { struct sock *sk = (struct sock *)msk; @@ -1281,8 +1286,18 @@ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ss inet_sk_state_load(sk) != TCP_ESTABLISHED))) return; - if (skb_queue_empty(&ssk->sk_receive_queue) && - !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) + if (!skb_queue_empty(&ssk->sk_receive_queue)) + return; + + if (!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) + mptcp_schedule_work(sk); + + /* when the fallback subflow closes the rx side, trigger a 'dummy' + * ingress data fin, so that the msk state will follow along + */ + if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) && + msk->first == ssk && + mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) mptcp_schedule_work(sk); } @@ -1842,11 +1857,6 @@ static void __subflow_state_change(struct sock *sk) rcu_read_unlock(); } -static bool subflow_is_done(const struct sock *sk) -{ - return sk->sk_shutdown & RCV_SHUTDOWN || sk->sk_state == TCP_CLOSE; -} - static void subflow_state_change(struct sock *sk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); @@ -1873,13 +1883,6 @@ static void subflow_state_change(struct sock *sk) subflow_error_report(sk); subflow_sched_work_if_closed(mptcp_sk(parent), sk); - - /* when the fallback subflow closes the rx side, trigger a 'dummy' - * ingress data fin, so that the msk state will follow along - */ - if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first == sk && - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) - mptcp_schedule_work(parent); } void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk) From patchwork Tue Feb 18 18:36:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980499 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04A0A1F5822; Tue, 18 Feb 2025 18:36:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903796; cv=none; b=rYrq0ACskQ/uLI35l+0i1FMIbvSWPgE2UR6oH0BA6ARDypFZFHePT6mw+6aWNEv8T909edkX3Ld54ECfhS+gJSuSrw2Qf86eu98j+yKcp5Nqw12ZyCxDBlUcntvC6KoRXeMvgrzmeNFoclICrm+O+xTnVTxq1lwbjJNUls7yPX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903796; c=relaxed/simple; bh=wuahuA5ylgrgljO5VDYDx96x4R3nL275ICd3P5Wv79o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dtNsDniIQaOpFkZv9O88qOzW5Q0BIV3KapOTFGL+Y2sW2ZRQkPTEZQIK4TlWigbNZseNYBkpdsIxgEA3MoPDfYWUPD30GOHcFQY4o1dqcrUQwN0YsItiLj4mDdCb/CU6HRL377KPaGtFgxI0KNppzhWqpn37odex4Ab99m91oSU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qrTWpMq+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qrTWpMq+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 508F6C4CEE2; Tue, 18 Feb 2025 18:36:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903794; bh=wuahuA5ylgrgljO5VDYDx96x4R3nL275ICd3P5Wv79o=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=qrTWpMq+FcCb2iAeqN9Nq26JmItHNfZqGiDWr+YEEJp9IqvQ2B3LGgL68Qy47hmcK QlicUzF2TgSiexpQzZQy/Tb/NK2Vd5/kXyy0YiZnZr3S8qcZkAyFi3gEeTF3eIDuRB R8St2XEoqmi/7u+XqKwjbcgL8KirdWZkN7N+OBhohktze4jXeuI6CRlh49i0Wkg9Ao 9JyPg0z6YPWH0Zj+IvJwxxalosNdCeGX7DSMjZbpISXyWgJ7oPWzCNAgjdj9/dacUh XTLeOZRWdCRNk2SX58a7QbwoDa2EZAN+ogn8S58VtGOw3LTT8DX1TIqo52Vx/JqViD 1ErvGwC+vg8GA== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:13 +0100 Subject: [PATCH net-next 2/7] mptcp: drop __mptcp_fastopen_gen_msk_ackseq() Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-2-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=5353; i=matttbe@kernel.org; h=from:subject:message-id; bh=OPeer42AvAhJ9v5YycxG/rFJZR0B5WxAYUWtDju/5ys=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMn3S2M7io/paYz65v9IzpjAlduxP+x+sXi3 +67ZLFaKGGJAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg c1qiD/9Zaj2zIeXq2gAh0aMD6rCOjJgPQcBxj/5qVOI5uyr/JXenv+4s0lzx4Cpa0smO/res+S1 9oyGQrMidxzvJW8jbXokJexrxGWGoSpntqSyW4zUUsPDoWbOE9xuTrUfCX6dIKIYBEbW7rxWytf TjsdydbgruOnU3R+GQE0u/rCG2BqVI09iXKfiDpmw10fSeNyfj7BvE1nkPCVu45q6rHmJjsNJcY 78FDdJizs2DJmB1KbNIQ8jkJVm9fZvNeGgTnAOrZ1MeZJHa4KqlX36H1ZcyCwu1kQ71VD9bHCHD 4lspgXMsrGTkzzQotiixK3F1858G5AeD478CGfrxBsKfpLbN4EW9NNMN6J/i9zGElLASBFxxg4Y WjLIMPly5Z0Hpvfr/1oJISlmALx+7U404WyQ/CJyHdurWc022zeEsbgGj72nUBP9pf0bgU3JGcL kEUpocLF3imR8Ca8CH0+8p5tMqkQ5S0hHnxnAg3+MzvVvWvaP5/hgHtByEhLsM2mX+50EqyqlAW 7+brGZiwNSm4YFXz6pQR2I24iCMaQY6zV3RVaGa+IQAi49EaFDDVfTaC+NW4obA3fWCLVQVe13R 0OnLamFeTQ/W5CuVEW2//nqoA464GTR4f5CGbCF/2N3hWY3HjWS4YzTMqhwne7p+4eWvzmGgm2C zsFIUBgZHuo+RKw== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni When we will move the whole RX path under the msk socket lock, updating the already queued skb for passive fastopen socket at 3rd ack time will be extremely painful and race prone The map_seq for already enqueued skbs is used only to allow correct coalescing with later data; preventing collapsing to the first skb of a fastopen connect we can completely remove the __mptcp_fastopen_gen_msk_ackseq() helper. Before dropping this helper, a new item had to be added to the mptcp_skb_cb structure. Because this item will be frequently tested in the fast path -- almost on every packet -- and because there is free space there, a single byte is used instead of a bitfield. This micro optimisation slightly reduces the number of CPU operations to do the associated check. Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/fastopen.c | 24 ++---------------------- net/mptcp/protocol.c | 4 +++- net/mptcp/protocol.h | 5 ++--- net/mptcp/subflow.c | 3 --- 4 files changed, 7 insertions(+), 29 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index a29ff901df7588dec24e330ddd77a4aeb1462b68..7777f5a2d14379853fcd13c4b57c5569be05a2e4 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -40,13 +40,12 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf tp->copied_seq += skb->len; subflow->ssn_offset += skb->len; - /* initialize a dummy sequence number, we will update it at MPC - * completion, if needed - */ + /* Only the sequence delta is relevant */ MPTCP_SKB_CB(skb)->map_seq = -skb->len; MPTCP_SKB_CB(skb)->end_seq = 0; MPTCP_SKB_CB(skb)->offset = 0; MPTCP_SKB_CB(skb)->has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp; + MPTCP_SKB_CB(skb)->cant_coalesce = 1; mptcp_data_lock(sk); @@ -58,22 +57,3 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf mptcp_data_unlock(sk); } - -void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow, - const struct mptcp_options_received *mp_opt) -{ - struct sock *sk = (struct sock *)msk; - struct sk_buff *skb; - - skb = skb_peek_tail(&sk->sk_receive_queue); - if (skb) { - WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq); - pr_debug("msk %p moving seq %llx -> %llx end_seq %llx -> %llx\n", sk, - MPTCP_SKB_CB(skb)->map_seq, MPTCP_SKB_CB(skb)->map_seq + msk->ack_seq, - MPTCP_SKB_CB(skb)->end_seq, MPTCP_SKB_CB(skb)->end_seq + msk->ack_seq); - MPTCP_SKB_CB(skb)->map_seq += msk->ack_seq; - MPTCP_SKB_CB(skb)->end_seq += msk->ack_seq; - } - - pr_debug("msk=%p ack_seq=%llx\n", msk, msk->ack_seq); -} diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6bd81904747066d8f2c1043dd81b372925f18cbb..55f9698f3c22f1dc423a7605c7b00bfda162b54c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -135,7 +135,8 @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, bool fragstolen; int delta; - if (MPTCP_SKB_CB(from)->offset || + if (unlikely(MPTCP_SKB_CB(to)->cant_coalesce) || + MPTCP_SKB_CB(from)->offset || ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) || !skb_try_coalesce(to, from, &fragstolen, &delta)) return false; @@ -366,6 +367,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, MPTCP_SKB_CB(skb)->end_seq = MPTCP_SKB_CB(skb)->map_seq + copy_len; MPTCP_SKB_CB(skb)->offset = offset; MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp; + MPTCP_SKB_CB(skb)->cant_coalesce = 0; if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) { /* in sequence */ diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 37226cdd9e3717c4f8cf0d4c879a0feaaa91d459..3c3e9b185ae35d92b5a2daae994a4a9e76f9cc84 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -129,7 +129,8 @@ struct mptcp_skb_cb { u64 map_seq; u64 end_seq; u32 offset; - u8 has_rxtstamp:1; + u8 has_rxtstamp; + u8 cant_coalesce; }; #define MPTCP_SKB_CB(__skb) ((struct mptcp_skb_cb *)&((__skb)->cb[0])) @@ -1059,8 +1060,6 @@ void mptcp_event_pm_listener(const struct sock *ssk, enum mptcp_event_type event); bool mptcp_userspace_pm_active(const struct mptcp_sock *msk); -void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow, - const struct mptcp_options_received *mp_opt); void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subflow, struct request_sock *req); int mptcp_nl_fill_addr(struct sk_buff *skb, diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 2926bdf88e42c5f2db6875b00b4eca2dbf49dba2..d2caffa56bdd98f5fd9ef07fdcb3610ea186b848 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -802,9 +802,6 @@ void __mptcp_subflow_fully_established(struct mptcp_sock *msk, subflow_set_remote_key(msk, subflow, mp_opt); WRITE_ONCE(subflow->fully_established, true); WRITE_ONCE(msk->fully_established, true); - - if (subflow->is_mptfo) - __mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt); } static struct sock *subflow_syn_recv_sock(const struct sock *sk, From patchwork Tue Feb 18 18:36:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980500 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA9391F583C; Tue, 18 Feb 2025 18:36:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903798; cv=none; b=Keoqd5qSi4zOg85e3PNHi917NQEG3Yz5Ldg/ByWy6R71r+Wa0lSYlPKpEEH14cGiG6unyibWK+GSYA1mVHuu2xnw3IXM95SAOAh9o+wvnBAuQb2F+1Lm16hiyNEHDTO/OH4uDJqbeYEMKoyOh1BZCRZl2EEA3YW4z58t1yvFFcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903798; c=relaxed/simple; bh=DyE6gQiuZ9Ci0OJxwDN8nNLr2u0AIT9fOHiDhwzebgY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bxC2K5+31vV/HoDcKsGCqg6vn9o6POkcakEfuuScqAzhOajk5q0SmoGQfezx3UsAYYVwJ5IQytE+Ycke/n+TAlnKSoKLRS+8DYCtooW70MvH8cOrj/e2GLtJjHXTbptCkgwX9czu0NLvN13oql7FAd+jCdQLD6iFJIlueU/hdnI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Qr0p1No5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Qr0p1No5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAF8FC4CEE4; Tue, 18 Feb 2025 18:36:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903798; bh=DyE6gQiuZ9Ci0OJxwDN8nNLr2u0AIT9fOHiDhwzebgY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Qr0p1No55a0JyGpnCunEIBqgL/NGu19xt+B6z7yptBJTsdm2CYWLf0psRDuihprAB 2mWV8n5Ux3Y+/lpnVPiWAQ9tu3pkgJ8WaZQ+ATVReE1qQCRpS/DQzHMPkrucFdg+p2 tx7nbbrC1zNhvDCx4IQxqKQtrBzeQpSNXcY9T5N4beMPpGfJLofC/QB0JU+3n/7GKP YiDg2iYtXtmiQojrdAvt2+puhYxTuLaGyP6AvJ9pUmA4XyXmzQ39XWXqOandc/VgE+ ZXGpxS8L9sgqm1451m46McMnhdTwYlp+cXqgLRbTAkSwi1aTgzklyF2Ymq4y7FVgXt TdCKRTzU820uQ== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:14 +0100 Subject: [PATCH net-next 3/7] mptcp: move the whole rx path under msk socket lock protection Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-3-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=12219; i=matttbe@kernel.org; h=from:subject:message-id; bh=O0n9mGfL/jxpDRwQYQi1erBk5pU+KUYXehVHyztmRzM=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMnEDdLb1X+h3v/kBFaoU6s4/tWXLSV8Xg4d yEQBj02vxmJAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg c2t/EADGl0VjWzDRF47ez0O+LtwGEpLHyU9uMq7Z8bDHYCzYqO9CnoYdjnWOGgKlxtBRPmt2srZ poIG5Ab5FNYzxSVlCfWzdIbOxjWFTwB+Z5ZBYIl3SQ1NTt4vJ1rnjnlvKd4MzmBbuPanlI1HuLd J1PuYHTUWzGhFOBjAQplWzl3dCfbe1wEu6N519QfZSp22o1Er4igqD/FUPdFi6sw4saQe3QydZX hOqL/sdLV0Ygn1wFsHu76U25gxkXTtv+ABR5LOvP5KDQEhlQww023Mxtm8AWAYr5Z1vJVmfKH+f 5L0Z2pJlfvyIH3NSRKVFwqOF9YM1p3Kh5JTHm7+BWzDmvJIlhAs1g0QKCCL+n6o+M1/w7L+nQOe a6FqGtARa6o25wBb36OXO6k5BNAD6hv/gj86gl8KeH8NBAU9ZvZHPC0H1cuug/t0wiQevL/WSIl sr5Ka1YbJ4qwInkteTgwBQg6A49BOHz0ylH2MT/d4T0k0nHszGnEgt5JX6rrIDKMgfmKsLuk0d/ Q5Z6243RTPMxsSt1OA2yx+V/QgyKu7Cbl7VAMtIfBnBeZxE6xx0ESHzF0qyZ2bkTYrNgjjfg9Yb xTjszJKhHvT4+sIUzlpB7+FDWC15bAFBHGXJZcCCU/+yO2owvPesPcCQnUD5SnfSvEyeTriMEvW lcNvdLd0z53BGWg== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni After commit c2e6048fa1cf ("mptcp: fix race in release_cb") we can move the whole MPTCP rx path under the socket lock leveraging the release_cb. We can drop a bunch of spin_lock pairs in the receive functions, use a single receive queue and invoke __mptcp_move_skbs only when subflows ask for it. This will allow more cleanup in the next patch. Some changes are worth specific mention: The msk rcvbuf update now always happens under both the msk and the subflow socket lock: we can drop a bunch of ONCE annotation and consolidate the checks. When the skbs move is delayed at msk release callback time, even the msk rcvbuf update is delayed; additionally take care of such action in __mptcp_move_skbs(). Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/fastopen.c | 1 + net/mptcp/protocol.c | 123 ++++++++++++++++++++++++--------------------------- net/mptcp/protocol.h | 2 +- 3 files changed, 60 insertions(+), 66 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index 7777f5a2d14379853fcd13c4b57c5569be05a2e4..f85ad19f3dd6c4bcbf31228054ccfd30755db5bc 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -48,6 +48,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf MPTCP_SKB_CB(skb)->cant_coalesce = 1; mptcp_data_lock(sk); + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); mptcp_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 55f9698f3c22f1dc423a7605c7b00bfda162b54c..8bdc7a7a58f31ac74d6a2156b2297af9cd90c635 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -645,18 +645,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, bool more_data_avail; struct tcp_sock *tp; bool done = false; - int sk_rbuf; - - sk_rbuf = READ_ONCE(sk->sk_rcvbuf); - - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - int ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf); - - if (unlikely(ssk_rbuf > sk_rbuf)) { - WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf); - sk_rbuf = ssk_rbuf; - } - } pr_debug("msk=%p ssk=%p\n", msk, ssk); tp = tcp_sk(ssk); @@ -724,7 +712,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, WRITE_ONCE(tp->copied_seq, seq); more_data_avail = mptcp_subflow_data_available(ssk); - if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) { + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) { done = true; break; } @@ -848,11 +836,30 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) return moved > 0; } +static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk) +{ + if (unlikely(ssk->sk_rcvbuf > sk->sk_rcvbuf)) + WRITE_ONCE(sk->sk_rcvbuf, ssk->sk_rcvbuf); +} + +static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + __mptcp_rcvbuf_update(sk, ssk); + + /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ + if (__mptcp_rmem(sk) > sk->sk_rcvbuf) + return; + + /* Wake-up the reader only for in-sequence data */ + if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); +} + void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); - struct mptcp_sock *msk = mptcp_sk(sk); - int sk_rbuf, ssk_rbuf; /* The peer can send data while we are shutting down this * subflow at msk destruction time, but we must avoid enqueuing @@ -861,19 +868,11 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) if (unlikely(subflow->disposable)) return; - ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf); - sk_rbuf = READ_ONCE(sk->sk_rcvbuf); - if (unlikely(ssk_rbuf > sk_rbuf)) - sk_rbuf = ssk_rbuf; - - /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ - if (__mptcp_rmem(sk) > sk_rbuf) - return; - - /* Wake-up the reader only for in-sequence data */ mptcp_data_lock(sk); - if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) - sk->sk_data_ready(sk); + if (!sock_owned_by_user(sk)) + __mptcp_data_ready(sk, ssk); + else + __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); mptcp_data_unlock(sk); } @@ -1946,16 +1945,17 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied); -static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, +static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, size_t len, int flags, struct scm_timestamping_internal *tss, int *cmsg_flags) { + struct mptcp_sock *msk = mptcp_sk(sk); struct sk_buff *skb, *tmp; int copied = 0; - skb_queue_walk_safe(&msk->receive_queue, skb, tmp) { + skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) { u32 offset = MPTCP_SKB_CB(skb)->offset; u32 data_len = skb->len - offset; u32 count = min_t(size_t, len - copied, data_len); @@ -1990,7 +1990,7 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, /* we will bulk release the skb memory later */ skb->destructor = NULL; WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize); - __skb_unlink(skb, &msk->receive_queue); + __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); msk->bytes_consumed += count; } @@ -2115,54 +2115,46 @@ static void __mptcp_update_rmem(struct sock *sk) WRITE_ONCE(msk->rmem_released, 0); } -static void __mptcp_splice_receive_queue(struct sock *sk) +static bool __mptcp_move_skbs(struct sock *sk) { + struct mptcp_subflow_context *subflow; struct mptcp_sock *msk = mptcp_sk(sk); - - skb_queue_splice_tail_init(&sk->sk_receive_queue, &msk->receive_queue); -} - -static bool __mptcp_move_skbs(struct mptcp_sock *msk) -{ - struct sock *sk = (struct sock *)msk; unsigned int moved = 0; bool ret, done; + /* verify we can move any data from the subflow, eventually updating */ + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) + mptcp_for_each_subflow(msk, subflow) + __mptcp_rcvbuf_update(sk, subflow->tcp_sock); + + if (__mptcp_rmem(sk) > sk->sk_rcvbuf) + return false; + do { struct sock *ssk = mptcp_subflow_recv_lookup(msk); bool slowpath; - /* we can have data pending in the subflows only if the msk - * receive buffer was full at subflow_data_ready() time, - * that is an unlikely slow path. - */ - if (likely(!ssk)) + if (unlikely(!ssk)) break; slowpath = lock_sock_fast(ssk); - mptcp_data_lock(sk); __mptcp_update_rmem(sk); done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved); - mptcp_data_unlock(sk); if (unlikely(ssk->sk_err)) __mptcp_error_report(sk); unlock_sock_fast(ssk, slowpath); } while (!done); - /* acquire the data lock only if some input data is pending */ ret = moved > 0; if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) || - !skb_queue_empty_lockless(&sk->sk_receive_queue)) { - mptcp_data_lock(sk); + !skb_queue_empty(&sk->sk_receive_queue)) { __mptcp_update_rmem(sk); ret |= __mptcp_ofo_queue(msk); - __mptcp_splice_receive_queue(sk); - mptcp_data_unlock(sk); } if (ret) mptcp_check_data_fin((struct sock *)msk); - return !skb_queue_empty(&msk->receive_queue); + return ret; } static unsigned int mptcp_inq_hint(const struct sock *sk) @@ -2170,7 +2162,7 @@ static unsigned int mptcp_inq_hint(const struct sock *sk) const struct mptcp_sock *msk = mptcp_sk(sk); const struct sk_buff *skb; - skb = skb_peek(&msk->receive_queue); + skb = skb_peek(&sk->sk_receive_queue); if (skb) { u64 hint_val = READ_ONCE(msk->ack_seq) - MPTCP_SKB_CB(skb)->map_seq; @@ -2216,7 +2208,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, while (copied < len) { int err, bytes_read; - bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss, &cmsg_flags); + bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, &tss, &cmsg_flags); if (unlikely(bytes_read < 0)) { if (!copied) copied = bytes_read; @@ -2225,7 +2217,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, copied += bytes_read; - if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk)) + if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk)) continue; /* only the MPTCP socket status is relevant here. The exit @@ -2251,7 +2243,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, /* race breaker: the shutdown could be after the * previous receive queue check */ - if (__mptcp_move_skbs(msk)) + if (__mptcp_move_skbs(sk)) continue; break; } @@ -2295,9 +2287,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } } - pr_debug("msk=%p rx queue empty=%d:%d copied=%d\n", - msk, skb_queue_empty_lockless(&sk->sk_receive_queue), - skb_queue_empty(&msk->receive_queue), copied); + pr_debug("msk=%p rx queue empty=%d copied=%d\n", + msk, skb_queue_empty(&sk->sk_receive_queue), copied); release_sock(sk); return copied; @@ -2824,7 +2815,6 @@ static void __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->join_list); INIT_LIST_HEAD(&msk->rtx_queue); INIT_WORK(&msk->work, mptcp_worker); - __skb_queue_head_init(&msk->receive_queue); msk->out_of_order_queue = RB_ROOT; msk->first_pending = NULL; WRITE_ONCE(msk->rmem_fwd_alloc, 0); @@ -3407,12 +3397,8 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) mptcp_for_each_subflow_safe(msk, subflow, tmp) __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags); - /* move to sk_receive_queue, sk_stream_kill_queues will purge it */ - mptcp_data_lock(sk); - skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue); __skb_queue_purge(&sk->sk_receive_queue); skb_rbtree_purge(&msk->out_of_order_queue); - mptcp_data_unlock(sk); /* move all the rx fwd alloc into the sk_mem_reclaim_final in * inet_sock_destruct() will dispose it @@ -3455,7 +3441,8 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk) #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \ BIT(MPTCP_RETRANSMIT) | \ - BIT(MPTCP_FLUSH_JOIN_LIST)) + BIT(MPTCP_FLUSH_JOIN_LIST) | \ + BIT(MPTCP_DEQUEUE)) /* processes deferred events and flush wmem */ static void mptcp_release_cb(struct sock *sk) @@ -3489,6 +3476,11 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT)) __mptcp_retrans(sk); + if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) { + /* notify ack seq update */ + mptcp_cleanup_rbuf(msk, 0); + sk->sk_data_ready(sk); + } cond_resched(); spin_lock_bh(&sk->sk_lock.slock); @@ -3726,7 +3718,8 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int *karg) return -EINVAL; lock_sock(sk); - __mptcp_move_skbs(msk); + if (__mptcp_move_skbs(sk)) + mptcp_cleanup_rbuf(msk, 0); *karg = mptcp_inq_hint(sk); release_sock(sk); break; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 3c3e9b185ae35d92b5a2daae994a4a9e76f9cc84..753456b73f90879126a36964924d2b6e08e2a1cc 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,6 +124,7 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 +#define MPTCP_DEQUEUE 8 struct mptcp_skb_cb { u64 map_seq; @@ -325,7 +326,6 @@ struct mptcp_sock { struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue; - struct sk_buff_head receive_queue; struct list_head conn_list; struct list_head rtx_queue; struct mptcp_data_frag *first_pending; From patchwork Tue Feb 18 18:36:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980501 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B38C1F585B; Tue, 18 Feb 2025 18:36:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903802; cv=none; b=Hb0UZlQFT7/m24YL6+mGS/aR35JSOeuCYHT5KlHhq3sCXxb9IoaVg75FClwb4XigN7A+ZE4ar9FWOrjkdrDrhCWtC6tmk/aQJ51n1C3NdBQwyfsCo9qZDH3XoDWHotJ9oUXwTrKEJUVnmH5C4RaAuuuI8fyLP/cvloipO2KWJO8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903802; c=relaxed/simple; bh=OoEmchlQBBotC5o70iyC1CwCPNurWZ6JNiqtCzdY7Ws=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=lypTP0a8OQ1htomv4YkBqHsAYvlMGwlWDSAQCCzY/igTOSpidLqKOgyPbAzSNzNqlBTodsIEjtredo9M10Prvyra/3OmGFXn1Bg9Y3CvkILYUgMlfjEXpTg97hv2SH5ZDkBDtwQ6240lfgZ+c/mx65/OqnHbTME5CNT2Lwjv7Hc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FmXOXYgG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FmXOXYgG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 973B8C4CEE2; Tue, 18 Feb 2025 18:36:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903801; bh=OoEmchlQBBotC5o70iyC1CwCPNurWZ6JNiqtCzdY7Ws=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=FmXOXYgG5F/9gWkBrgA7YhMb8jEVRbVFSQB5Gx+ffUjJIA4Q8bJ7ofwhpj7hTvFpV GNqTmM7LnivKyszB5CT4sBYJnSxRGjm1MCQjmxuzC+jehahjw4cKrAe1SyQBcIQWbj TDVDTsoU4j564dDlY7owPNGRdcEurpTfvnIXAyzAbjtPLeShYO8QKnM0vSsVlQKU88 WPdb8CntBKaOu684Z6CFStRGhH95RT6mWjykw4P9vSckqNRrPmySwuHQzc8CpqKiNe uHblHAfrixb0WkCQusX5BJF0ZutYW5JDyiXsNwwNWkga4bcmiuLj3N0q1l1cEASvRN h2lPggPpvYQfw== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:15 +0100 Subject: [PATCH net-next 4/7] mptcp: cleanup mem accounting Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-4-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=9614; i=matttbe@kernel.org; h=from:subject:message-id; bh=nH6TbZ5V5qQ8waWiktorcZY2CmpT15tov8l5vC4mv4w=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMnJOFs2Jj+z1+zpnXC4sNW8qFmbPHS4cKev 9Ihdti/jfSJAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg cwAkEADOPpslPpgdDoDz+9BQ2gZLK5XioxOiTRRCO/fNb1YX/gSwiS5O/Sbwq6ECQw79B+H3Y7z KV22bDUv0lL98m1971Rm/USGLmoEcaYBs0bhlIUGAq+Rr1+DOZG64o6p34W3qgv05oxdxAelLgn 2sQ1IOEelqfQjm4qk6M6/mx4zndcAZF56XCpxH6YzFS4hp8CfQDAnEqgLQ8vBIASWbwW02IeSX7 ySx8xxSiWKiFfOAoBmya6Xr2ifG0Vj+OXKyZiPo9oQqt8nUv/sZFFWEdqryYNM/Uin4SgO2oGcg Cjl9cd1bzs2gMo2FmOV+6UMIHsUef56Iz9UmHgH5SHGvrWWVTOJJOL1IhT56pK/aSIx1WslZg3o veudCoFr3TiAs8SEWr/y3m3AnncD1HT9aCcfBDC2BTpwt7myiD2Nsq2eaQXfDwEsx03yWnNhqLZ aF5YQEpeoKywzpuNZz0C8dpJIbCCLYcAmr8EdEEo+vyayyY8eQ9ZSs3Faak+aEvzBIDputXPMJe Cj/5ivwOU4FwSteTge5i546ZbyT6ouMGp5TqJJJJP3+uS+G6+lk7jHyRlR/+Ahey8EuOTXTUD52 aBYrMa16xGoY2lvR4Artn18Tzs1jBsPw0HzZrd7NyDfvlGoEVLQ07PDASkDzSvS4ERyJo8IUBFu MwPax0mduj+eIlw== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni After the previous patch, updating sk_forward_memory is cheap and we can drop a lot of complexity from the MPTCP memory accounting, removing the custom fwd mem allocations for rmem. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/fastopen.c | 2 +- net/mptcp/protocol.c | 115 ++++----------------------------------------------- net/mptcp/protocol.h | 4 +- 3 files changed, 10 insertions(+), 111 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index f85ad19f3dd6c4bcbf31228054ccfd30755db5bc..b9e4511979028c10d232efbcaca68400fc4f2e7a 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -50,7 +50,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf mptcp_data_lock(sk); DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); - mptcp_set_owner_r(skb, sk); + skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); mptcp_sk(sk)->bytes_received += skb->len; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8bdc7a7a58f31ac74d6a2156b2297af9cd90c635..080877f8daf7e3ff36531f3e11079d2163676f2d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -118,17 +118,6 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } -static void mptcp_rmem_fwd_alloc_add(struct sock *sk, int size) -{ - WRITE_ONCE(mptcp_sk(sk)->rmem_fwd_alloc, - mptcp_sk(sk)->rmem_fwd_alloc + size); -} - -static void mptcp_rmem_charge(struct sock *sk, int size) -{ - mptcp_rmem_fwd_alloc_add(sk, -size); -} - static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, struct sk_buff *from) { @@ -151,7 +140,7 @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, * negative one */ atomic_add(delta, &sk->sk_rmem_alloc); - mptcp_rmem_charge(sk, delta); + sk_mem_charge(sk, delta); kfree_skb_partial(from, fragstolen); return true; @@ -166,44 +155,6 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to, return mptcp_try_coalesce((struct sock *)msk, to, from); } -static void __mptcp_rmem_reclaim(struct sock *sk, int amount) -{ - amount >>= PAGE_SHIFT; - mptcp_rmem_charge(sk, amount << PAGE_SHIFT); - __sk_mem_reduce_allocated(sk, amount); -} - -static void mptcp_rmem_uncharge(struct sock *sk, int size) -{ - struct mptcp_sock *msk = mptcp_sk(sk); - int reclaimable; - - mptcp_rmem_fwd_alloc_add(sk, size); - reclaimable = msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk); - - /* see sk_mem_uncharge() for the rationale behind the following schema */ - if (unlikely(reclaimable >= PAGE_SIZE)) - __mptcp_rmem_reclaim(sk, reclaimable); -} - -static void mptcp_rfree(struct sk_buff *skb) -{ - unsigned int len = skb->truesize; - struct sock *sk = skb->sk; - - atomic_sub(len, &sk->sk_rmem_alloc); - mptcp_rmem_uncharge(sk, len); -} - -void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk) -{ - skb_orphan(skb); - skb->sk = sk; - skb->destructor = mptcp_rfree; - atomic_add(skb->truesize, &sk->sk_rmem_alloc); - mptcp_rmem_charge(sk, skb->truesize); -} - /* "inspired" by tcp_data_queue_ofo(), main differences: * - use mptcp seqs * - don't cope with sacks @@ -316,25 +267,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) end: skb_condense(skb); - mptcp_set_owner_r(skb, sk); -} - -static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int size) -{ - struct mptcp_sock *msk = mptcp_sk(sk); - int amt, amount; - - if (size <= msk->rmem_fwd_alloc) - return true; - - size -= msk->rmem_fwd_alloc; - amt = sk_mem_pages(size); - amount = amt << PAGE_SHIFT; - if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV)) - return false; - - mptcp_rmem_fwd_alloc_add(sk, amount); - return true; + skb_set_owner_r(skb, sk); } static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, @@ -352,7 +285,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, skb_orphan(skb); /* try to fetch required memory from subflow */ - if (!mptcp_rmem_schedule(sk, ssk, skb->truesize)) { + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); goto drop; } @@ -377,7 +310,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, if (tail && mptcp_try_coalesce(sk, tail, skb)) return true; - mptcp_set_owner_r(skb, sk); + skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); return true; } else if (after64(MPTCP_SKB_CB(skb)->map_seq, msk->ack_seq)) { @@ -1987,9 +1920,10 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, } if (!(flags & MSG_PEEK)) { - /* we will bulk release the skb memory later */ + /* avoid the indirect call, we know the destructor is sock_wfree */ skb->destructor = NULL; - WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize); + atomic_sub(skb->truesize, &sk->sk_rmem_alloc); + sk_mem_uncharge(sk, skb->truesize); __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); msk->bytes_consumed += count; @@ -2103,18 +2037,6 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) msk->rcvq_space.time = mstamp; } -static void __mptcp_update_rmem(struct sock *sk) -{ - struct mptcp_sock *msk = mptcp_sk(sk); - - if (!msk->rmem_released) - return; - - atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc); - mptcp_rmem_uncharge(sk, msk->rmem_released); - WRITE_ONCE(msk->rmem_released, 0); -} - static bool __mptcp_move_skbs(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2138,7 +2060,6 @@ static bool __mptcp_move_skbs(struct sock *sk) break; slowpath = lock_sock_fast(ssk); - __mptcp_update_rmem(sk); done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved); if (unlikely(ssk->sk_err)) @@ -2146,12 +2067,7 @@ static bool __mptcp_move_skbs(struct sock *sk) unlock_sock_fast(ssk, slowpath); } while (!done); - ret = moved > 0; - if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) || - !skb_queue_empty(&sk->sk_receive_queue)) { - __mptcp_update_rmem(sk); - ret |= __mptcp_ofo_queue(msk); - } + ret = moved > 0 || __mptcp_ofo_queue(msk); if (ret) mptcp_check_data_fin((struct sock *)msk); return ret; @@ -2817,8 +2733,6 @@ static void __mptcp_init_sock(struct sock *sk) INIT_WORK(&msk->work, mptcp_worker); msk->out_of_order_queue = RB_ROOT; msk->first_pending = NULL; - WRITE_ONCE(msk->rmem_fwd_alloc, 0); - WRITE_ONCE(msk->rmem_released, 0); msk->timer_ival = TCP_RTO_MIN; msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO; @@ -3044,8 +2958,6 @@ static void __mptcp_destroy_sock(struct sock *sk) sk->sk_prot->destroy(sk); - WARN_ON_ONCE(READ_ONCE(msk->rmem_fwd_alloc)); - WARN_ON_ONCE(msk->rmem_released); sk_stream_kill_queues(sk); xfrm_sk_free_policy(sk); @@ -3403,8 +3315,6 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) /* move all the rx fwd alloc into the sk_mem_reclaim_final in * inet_sock_destruct() will dispose it */ - sk_forward_alloc_add(sk, msk->rmem_fwd_alloc); - WRITE_ONCE(msk->rmem_fwd_alloc, 0); mptcp_token_destroy(msk); mptcp_pm_free_anno_list(msk); mptcp_free_local_addr_list(msk); @@ -3500,8 +3410,6 @@ static void mptcp_release_cb(struct sock *sk) if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) __mptcp_sync_sndbuf(sk); } - - __mptcp_update_rmem(sk); } /* MP_JOIN client subflow must wait for 4th ack before sending any data: @@ -3672,12 +3580,6 @@ static void mptcp_shutdown(struct sock *sk, int how) __mptcp_wr_shutdown(sk); } -static int mptcp_forward_alloc_get(const struct sock *sk) -{ - return READ_ONCE(sk->sk_forward_alloc) + - READ_ONCE(mptcp_sk(sk)->rmem_fwd_alloc); -} - static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v) { const struct sock *sk = (void *)msk; @@ -3836,7 +3738,6 @@ static struct proto mptcp_prot = { .hash = mptcp_hash, .unhash = mptcp_unhash, .get_port = mptcp_get_port, - .forward_alloc_get = mptcp_forward_alloc_get, .stream_memory_free = mptcp_stream_memory_free, .sockets_allocated = &mptcp_sockets_allocated, diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 753456b73f90879126a36964924d2b6e08e2a1cc..613d556ed938a99a2800b4384ee4c6cda9483381 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -281,7 +281,6 @@ struct mptcp_sock { u64 rcv_data_fin_seq; u64 bytes_retrans; u64 bytes_consumed; - int rmem_fwd_alloc; int snd_burst; int old_wspace; u64 recovery_snd_nxt; /* in recovery mode accept up to this seq; @@ -296,7 +295,6 @@ struct mptcp_sock { u32 last_ack_recv; unsigned long timer_ival; u32 token; - int rmem_released; unsigned long flags; unsigned long cb_flags; bool recovery; /* closing subflow write queue reinjected */ @@ -387,7 +385,7 @@ static inline void msk_owned_by_me(const struct mptcp_sock *msk) */ static inline int __mptcp_rmem(const struct sock *sk) { - return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released); + return atomic_read(&sk->sk_rmem_alloc); } static inline int mptcp_win_from_space(const struct sock *sk, int space) From patchwork Tue Feb 18 18:36:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980502 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96DB726B2D3; Tue, 18 Feb 2025 18:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903805; cv=none; b=dAaBdk29YyM9KVDqUT+81cfLk3DJ8pL8XU6xjT3+2Zc99ZunfWgx3erh4b4an2cAAbcrhCioc9lwqXh3RcUrI9mWWNojUk99mfv5VFvQvnG4xJMfvvuonyNQeY31kK3UMx/YHE4pL058O07a9V1aBWqXhv6FYW4PTbXiC2QTsoI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903805; c=relaxed/simple; bh=a2n1iRP+j6iFehN5PBuGMpY43pwjb/WhoWJ7Nx56u8Q=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=g0MD9TSPcQqmkY9caPwCPMEarSUKy2ocL7M7NmItJfvOcyCGUXFNiiARox0N62W2G77n9XZvPb/5/HdIUZQxdEDuRuh7pTEYyk1hblIMUHsVOcKsflygoRuAMR6UtJzB9iXltL3j/9VG+w0b9HCkjKvi9jXAqv9Sj5kamSHv3T8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WKs+BKHb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WKs+BKHb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3FCF8C4CEE4; Tue, 18 Feb 2025 18:36:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903805; bh=a2n1iRP+j6iFehN5PBuGMpY43pwjb/WhoWJ7Nx56u8Q=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=WKs+BKHbuj7T4KehhFAsemBS9FoQrVFyGcFLIBh6gTLvvt/C7b+h0yc+ivlDX4fK3 73ME2Fhv4n8jUEvLzIwgpvKdemviZwsikf1hBA4N44uzEZahOAjjXR5lHllGZfp57b xtml2Rdgbl5tBWPCmJL0+40kkKmWhUQt00BNKhIa9f4r2bju/W38dIrZO4QcjPEdiL ZbY8dV6Ika6wQ4bZ0TYxEo3G+8bsJWbqmDrhPSY9d8FZPqi3r/Idj3TTHO1k2l68I5 DHLm/ms9yPrFWngzGl1HxQ5050Re88HJu+qPk7rjQsJgk8TBgNLMmK0Y/bmn5u5O9I HEaNkNAtWpRRQ== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:16 +0100 Subject: [PATCH net-next 5/7] net: dismiss sk_forward_alloc_get() Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-5-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=4080; i=matttbe@kernel.org; h=from:subject:message-id; bh=viiaOyZW64iyCHKnCj0hoDEEHUJ4EIG7vZ9h8lhf3Gk=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMnYA3KhIc1NRpfdciZGIALMCW6GMTMW20Z2 V3Hu/DXVT2JAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg cw4FD/4ttVC+hSUqJBWuS+exx/4G58u/1diepZ3yK0/xbX2eg6hCflT/KNpzWouEyINnwZph4L0 4EhKU1109fFZSphu7/aGh2nQwfJ+qQB4PdEZAUZTPL0TOaTxoKRKn9sAch3vzeUHN3+lo+eO/fz TLQthg5ZJV/pIBh84aWweCaYSSwEMVjpF7IRFo41exV5UdlctHpMVZX4CxQd5qEdx7+e+ywD2nf Ur25vuB2je4eCMXN71sSTccReKW9NVJJxxSua4hAkXJGCXQDgFd8ShlKJdUyZFbSeU0Nnhz1+jp pcIsMak4veyWLXw/tDFmWTXACkH0RIpM9ZtsI2XhfQbAMygvXOolh+Ze1jt5mWFzb3NTF2vW0N1 VeA5VLy7PA1JJw6d/fdxeB45y0uXwf5DkBHBoscZOTeDIQbgbrw9v9Dyb2zCuDlpEpJK1OMq/SW YbxJpSOaA0hu7f9jvS94wlyQrgfFUaNlawei8EJ9pQBRbAUBnzJnQk8wJcNUdthERLggW+1lKWV wyjpFoIOg5s7XIxYihXrYVtd6NqrHKit8kR3xTTi/CuKtFjeQfLUAmVYtO1bsW+84ZnorwmFG+y qQg/ilNkE1EyjHgSd807RQ7IGzF1Gelivm56TRBbk5x6qoFDQizDscz1mydzC1I8wLEGYDE8D87 nzrJ7Z5XWVHBP0g== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni After the previous patch we can remove the forward_alloc_get proto callback, basically reverting commit 292e6077b040 ("net: introduce sk_forward_alloc_get()") and commit 66d58f046c9d ("net: use sk_forward_alloc_get() in sk_get_meminfo()"). Signed-off-by: Paolo Abeni Acked-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- include/net/sock.h | 13 ------------- net/core/sock.c | 2 +- net/ipv4/af_inet.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/sched/em_meta.c | 2 +- 5 files changed, 4 insertions(+), 17 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 60ebf3c7b229e257b164e0de1f56543ea69f38f3..ac7fb5bd8ef9af10135a6e703408f2b24bd3d713 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1285,10 +1285,6 @@ struct proto { unsigned int inuse_idx; #endif -#if IS_ENABLED(CONFIG_MPTCP) - int (*forward_alloc_get)(const struct sock *sk); -#endif - bool (*stream_memory_free)(const struct sock *sk, int wake); bool (*sock_is_readable)(struct sock *sk); /* Memory pressure */ @@ -1349,15 +1345,6 @@ int sock_load_diag_module(int family, int protocol); INDIRECT_CALLABLE_DECLARE(bool tcp_stream_memory_free(const struct sock *sk, int wake)); -static inline int sk_forward_alloc_get(const struct sock *sk) -{ -#if IS_ENABLED(CONFIG_MPTCP) - if (sk->sk_prot->forward_alloc_get) - return sk->sk_prot->forward_alloc_get(sk); -#endif - return READ_ONCE(sk->sk_forward_alloc); -} - static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) { if (READ_ONCE(sk->sk_wmem_queued) >= READ_ONCE(sk->sk_sndbuf)) diff --git a/net/core/sock.c b/net/core/sock.c index 53c7af0038c4fca630e1ac2ebecf55558cb16eef..0d385bf27b38d97458e6a695a559f4f1600773c4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3882,7 +3882,7 @@ void sk_get_meminfo(const struct sock *sk, u32 *mem) mem[SK_MEMINFO_RCVBUF] = READ_ONCE(sk->sk_rcvbuf); mem[SK_MEMINFO_WMEM_ALLOC] = sk_wmem_alloc_get(sk); mem[SK_MEMINFO_SNDBUF] = READ_ONCE(sk->sk_sndbuf); - mem[SK_MEMINFO_FWD_ALLOC] = sk_forward_alloc_get(sk); + mem[SK_MEMINFO_FWD_ALLOC] = READ_ONCE(sk->sk_forward_alloc); mem[SK_MEMINFO_WMEM_QUEUED] = READ_ONCE(sk->sk_wmem_queued); mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc); mem[SK_MEMINFO_BACKLOG] = READ_ONCE(sk->sk_backlog.len); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 21f46ee7b6e95329a2f7f0e0429eebf1648e7f9d..5df1f1325259d9b9dbe3be19a81066f85cf306e5 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -153,7 +153,7 @@ void inet_sock_destruct(struct sock *sk) WARN_ON_ONCE(atomic_read(&sk->sk_rmem_alloc)); WARN_ON_ONCE(refcount_read(&sk->sk_wmem_alloc)); WARN_ON_ONCE(sk->sk_wmem_queued); - WARN_ON_ONCE(sk_forward_alloc_get(sk)); + WARN_ON_ONCE(sk->sk_forward_alloc); kfree(rcu_dereference_protected(inet->inet_opt, 1)); dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1)); diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 321acc8abf17e8c7d6a4e3326615123fff19deab..efe2a085cf68e90cd1e79b5556e667a0fd044bfd 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -282,7 +282,7 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, struct inet_diag_meminfo minfo = { .idiag_rmem = sk_rmem_alloc_get(sk), .idiag_wmem = READ_ONCE(sk->sk_wmem_queued), - .idiag_fmem = sk_forward_alloc_get(sk), + .idiag_fmem = READ_ONCE(sk->sk_forward_alloc), .idiag_tmem = sk_wmem_alloc_get(sk), }; diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c index 8996c73c9779b5fa804e6f913834cf1fe4d071e6..3f2e707a11d18922d7d9dd93e8315c1ab26eebc7 100644 --- a/net/sched/em_meta.c +++ b/net/sched/em_meta.c @@ -460,7 +460,7 @@ META_COLLECTOR(int_sk_fwd_alloc) *err = -1; return; } - dst->value = sk_forward_alloc_get(sk); + dst->value = READ_ONCE(sk->sk_forward_alloc); } META_COLLECTOR(int_sk_sndbuf) From patchwork Tue Feb 18 18:36:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980503 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 963ED270EC3; Tue, 18 Feb 2025 18:36:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903809; cv=none; b=JehsqM5h1oUSZCq3V81DbeevHfbUGw2H9r/SHT2Bi8oK7hKwI9jJZZ6dfKsHQW3ExoXdWfd4xhF3/nFTNJ3FaHguD3zIRnkCXDHMDUj0pxVZOc6YzSwOPUdcTzL0x86rLgf3S8o6W8yj1EwqvwzM8Erd8w17aa1xgPyPcVCPeuc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903809; c=relaxed/simple; bh=JHX0AHDU9Z3wO5E/PzOjf/tEoouvJ5ogW4zW/Wl9gQQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=YtbN/HmYpH5dIEEzxWO5HKKLPjLm6mnQBGCz3WuvnsGsNBGlFPN05CbHetpLBeNPK12PZaL+MwuepbzaXTyzLHbWJmBSJzfnSsp4TuoeP4LqXOJUOkUW399yV6dQ8GcBDWKYNorfKljsam1YelacN3mhA+JMmRBa9xRd5K7plqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=d0EnWWX7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="d0EnWWX7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB96AC4CEEB; Tue, 18 Feb 2025 18:36:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903809; bh=JHX0AHDU9Z3wO5E/PzOjf/tEoouvJ5ogW4zW/Wl9gQQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=d0EnWWX7njrWBA1vyWmv3LhqNoqBkpquxupxyPYu3V5LV7mO+p+r2kXXwRRbxzNrK oRueJZ5JGZfYM3sRciHZCcvaDXztP48oST8amvk/H4S+yhT59dK0qfb59fvAoVAiHC EjBSAzzECG0h8LeYH7e1TNEQIw574QZqHylT60VAkHXEXIcpHqDPz5xra/6FbKjqIN qMRlDpM6aHXtZDK0akxi05KPG2fKGzS0PGsZ/Kez207cwzop0jEFxjZYPmJlhlHzIb 06iqlcVRGIy8tA8Q7AWnyv8toMtZ0K1+k5qNemFiErwZ3KFI87rdNserehdAeekQZh s/JV7LQq4fxxg== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:17 +0100 Subject: [PATCH net-next 6/7] mptcp: dismiss __mptcp_rmem() Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-6-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=3152; i=matttbe@kernel.org; h=from:subject:message-id; bh=eXI8QUSL7/rXzK6E2jx6P6T/20wyQGDw98FRoytZKY8=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMnBYnWH2koruYhyBP2MYhxM5254T/PsjBAo CYYDUvWX++JAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg c9VqEACMNiqYIb7EJmojhr3S0niZ+WGvzsXVYeE2zjbLAkS4hWaTkr9RL/HxRXXRz1xGXA3CuRj j2SDDsM9H8fgm9JCZbiu2GUWM7VKQEvC399NjvMQoNJZSAjNrOH0WuHl/F045z8cjKdJJ3gPKS/ wGiBSsdivbSslbwU0ihBNDx1IiaUNGpBqRo6kX92gtV4ScDOZbPrhQRq3bgBhTimj2uuM3PlPPk B6fMpHT/Wco66ayBOAV51VUAbopML4OWdTrR8xP0Pa2ro3ak+cqXX+m3M8OJYrN22V0ZoZ6iwKL 8NjBmkbo09W7KHHXqoOGnjp6uyFXoGaHZf6bSvFkvSAE0VcOhsxF8cdi1/rY/vguFsuVh0Fikiz KoKSJLbRxS3LH7wdLZmKdL5Ha7tHIks1NIsPVWYIVopFrilUPlvgT2VgZ/hVzYB/kHclu5pCQ4B F8P4gCT9fBNB7qu+1qwcd3dAGA8oV1+nHXjbi9DvvTto1WF+1MBz/ew1hMPWcXVwcW547c4yuKd 65Y5BpQxi2DvYxrI6Newh3aMmwNQ51TiFJ+kWfaZsIooEr4/tjbWi53Rjz6iRAoZb27kS+irFSP HU8Y/N2rXmc0mF6UAMNqTMGbW1RXRsUtO3OclwhZICrI0Tm8jUIBN13K/TMv5cDmjSm690/Fo5i L8pn2rpcjcHhcLw== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni After the RX path refactor, it become a wrapper for sk_rmem_alloc access, with a slightly misleading name. Just drop it. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 8 ++++---- net/mptcp/protocol.h | 11 ++--------- 2 files changed, 6 insertions(+), 13 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 080877f8daf7e3ff36531f3e11079d2163676f2d..c709f654cd5a4944390cf1e160f59cd3b509b66d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -496,7 +496,7 @@ static void mptcp_cleanup_rbuf(struct mptcp_sock *msk, int copied) bool cleanup, rx_empty; cleanup = (space > 0) && (space >= (old_space << 1)) && copied; - rx_empty = !__mptcp_rmem(sk) && copied; + rx_empty = !sk_rmem_alloc_get(sk) && copied; mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); @@ -645,7 +645,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, WRITE_ONCE(tp->copied_seq, seq); more_data_avail = mptcp_subflow_data_available(ssk); - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) { + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) { done = true; break; } @@ -782,7 +782,7 @@ static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) __mptcp_rcvbuf_update(sk, ssk); /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ - if (__mptcp_rmem(sk) > sk->sk_rcvbuf) + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) return; /* Wake-up the reader only for in-sequence data */ @@ -2049,7 +2049,7 @@ static bool __mptcp_move_skbs(struct sock *sk) mptcp_for_each_subflow(msk, subflow) __mptcp_rcvbuf_update(sk, subflow->tcp_sock); - if (__mptcp_rmem(sk) > sk->sk_rcvbuf) + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) return false; do { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 613d556ed938a99a2800b4384ee4c6cda9483381..a1a077bae7b6ec4fab5b266e2613acb145eb343f 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -380,14 +380,6 @@ static inline void msk_owned_by_me(const struct mptcp_sock *msk) #define mptcp_sk(ptr) container_of_const(ptr, struct mptcp_sock, sk.icsk_inet.sk) #endif -/* the msk socket don't use the backlog, also account for the bulk - * free memory - */ -static inline int __mptcp_rmem(const struct sock *sk) -{ - return atomic_read(&sk->sk_rmem_alloc); -} - static inline int mptcp_win_from_space(const struct sock *sk, int space) { return __tcp_win_from_space(mptcp_sk(sk)->scaling_ratio, space); @@ -400,7 +392,8 @@ static inline int mptcp_space_from_win(const struct sock *sk, int win) static inline int __mptcp_space(const struct sock *sk) { - return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - __mptcp_rmem(sk)); + return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - + sk_rmem_alloc_get(sk)); } static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk) From patchwork Tue Feb 18 18:36:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthieu Baerts X-Patchwork-Id: 13980504 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 369511EB5DE; Tue, 18 Feb 2025 18:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903814; cv=none; b=HSu9CyzRH4ljXG7bdZbYybNH4IP5fctUnZh62h2PLpTsHBg/Vos3B5NTptu+6PP6DRue/zIdgLXM+i39xGAWhgff4ODETzYzKIgX9nfF/ai0pvmB9yslIW3UVs+IvP+X40LL47GE2lQxlSig3iLF3jZUBJmgG8MTLPVynG7BU7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739903814; c=relaxed/simple; bh=FuEmwmBco4pfgcrOyMQPaOEQ5AMu1vAUJfMUV8FkDtc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nqClFfjVUZUCIiV3YSbUVshEc+snQau+VjGAF703QqmU5ueM/ZoX6Ymhkjv2RC5dKoaBEYkvsA96G/rer0+NOI7zTcxnQGiYgNGs8Ck0lLDZJyPkLJTRhCe7uMzWYFuuQuRLeI0mpTmZgHPaU1ZTK0f4r5mKoWZSv9ke8s1oHb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ll7DrvDc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ll7DrvDc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FF04C4CEE2; Tue, 18 Feb 2025 18:36:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739903812; bh=FuEmwmBco4pfgcrOyMQPaOEQ5AMu1vAUJfMUV8FkDtc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Ll7DrvDc7wZJh0iSzzPZ8RkckswbLFwwQrdHbgPkcdKEBaXI+BZ64U6H7w0Krm/F1 80mM843qgQd1/37WYnhH+lL1uRTjVMwFKEVTDnDKxMJ4M3uPeBlqx3X1qU2wWUdXLk o4nF1zqYq6oxcT0szQUc0rmIVoH5YQmQI+GfizpvPA+t1BBF1/4Cx4fv3hCGTZQ5Zp Xa6IwUSZBtYbGozng/IpqcFNZkz4OM/uFRZIw9hsgXUrqGDLV2VBNIjDfLxEE6oUKz 29NcBhjv58HKPjOq7gmtVc4eq2TwWqRwKT4FkfBonKsfL/OtUQ2FZuQnWG75AZEXbt 07+8qN55+G14Q== From: "Matthieu Baerts (NGI0)" Date: Tue, 18 Feb 2025 19:36:18 +0100 Subject: [PATCH net-next 7/7] mptcp: micro-optimize __mptcp_move_skb() Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250218-net-next-mptcp-rx-path-refactor-v1-7-4a47d90d7998@kernel.org> References: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> In-Reply-To: <20250218-net-next-mptcp-rx-path-refactor-v1-0-4a47d90d7998@kernel.org> To: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=7810; i=matttbe@kernel.org; h=from:subject:message-id; bh=UUZ2l4CArzUWVM9zGO3re19UeTioW3f1Q/TZ8tu7/is=; b=owEBbQKS/ZANAwAIAfa3gk9CaaBzAcsmYgBntNMn/JObq6JQyNbP7lEygvD9z7ysS+k4zTqCV sJ4zhYrwAaJAjMEAAEIAB0WIQToy4X3aHcFem4n93r2t4JPQmmgcwUCZ7TTJwAKCRD2t4JPQmmg cxayEACwPcuobyxN+pAtEAADdCZLdNZpT7IZ+EI/nbBxCQfPNFKpdUgTGWvL+C7WZDFcNP1NSzQ zDDrGo2Z1yQz1eVf26tE+tg1LeLe6eEPWzfQqWXP9NKHBQjGk8e/HAI7p2SlB9RuiJpktlgbTTW wYdymPLYZLPK0QXPgOmshMk2XWYn9aNl4V1hZTspRsj4DJxmTbRDqSIJR3dvJJ+yrPj3QF893a+ lU8k6gLZAFVVqb6lEhHNyBwNs3HIckqiaJv3d+HJx76OoCAzH43X95gxWsg2yFB+F7d6Gw6wsml OhqxxCSMmB6MS5/QUN5YzJfoNwsCPMipn4esD4yueJf7P5ln8O4UEJpOt5p60UVXcMfNpc/+22q Xu9tvzQDlLucEc6TDl7TIXNh8JQiCFnFcVWbLDkdHrgNQgSY6XUT6KCb+iJhUtBO/mCYTwBdtS2 DDEjhCsavdE4e1D+cBfY0ly/KXBOKV/K+8Ep8hIked6qfdf8s7USRbbULgDegmJ+mRC76xKbDms Oh/K+vYc4cPiXylehd4l0410OSA16NJFZkVqaX6X6BlqMqcdsxmmHwC0xNJG6a12Gf4cmgCWHeK 4tJoc8Yv464IbhQacUABmcBycEkYjBS0gIDJIL46ZNZFk1PDUDLy54oUGfn775WrQGma3V6CwlQ rUR+8iHmSslTmtg== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 X-Patchwork-Delegate: kuba@kernel.org From: Paolo Abeni After the RX path refactor the mentioned function is expected to run frequently, let's optimize it a bit. Scan for ready subflow from the last processed one, and stop after traversing the list once or reaching the msk memory limit - instead of looking for dubious per-subflow conditions. Also re-order the memory limit checks, to avoid duplicate tests. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 111 +++++++++++++++++++++++---------------------------- net/mptcp/protocol.h | 2 + 2 files changed, 52 insertions(+), 61 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c709f654cd5a4944390cf1e160f59cd3b509b66d..6b61b7dee33be10294ae1101f9206144878a3192 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -569,15 +569,13 @@ static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk) } static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, - struct sock *ssk, - unsigned int *bytes) + struct sock *ssk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct sock *sk = (struct sock *)msk; - unsigned int moved = 0; bool more_data_avail; struct tcp_sock *tp; - bool done = false; + bool ret = false; pr_debug("msk=%p ssk=%p\n", msk, ssk); tp = tcp_sk(ssk); @@ -587,20 +585,16 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, struct sk_buff *skb; bool fin; + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) + break; + /* try to move as much data as available */ map_remaining = subflow->map_data_len - mptcp_subflow_get_map_offset(subflow); skb = skb_peek(&ssk->sk_receive_queue); - if (!skb) { - /* With racing move_skbs_to_msk() and __mptcp_move_skbs(), - * a different CPU can have already processed the pending - * data, stop here or we can enter an infinite loop - */ - if (!moved) - done = true; + if (unlikely(!skb)) break; - } if (__mptcp_check_fallback(msk)) { /* Under fallback skbs have no MPTCP extension and TCP could @@ -613,19 +607,13 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, offset = seq - TCP_SKB_CB(skb)->seq; fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; - if (fin) { - done = true; + if (fin) seq++; - } if (offset < skb->len) { size_t len = skb->len - offset; - if (tp->urg_data) - done = true; - - if (__mptcp_move_skb(msk, ssk, skb, offset, len)) - moved += len; + ret = __mptcp_move_skb(msk, ssk, skb, offset, len) || ret; seq += len; if (unlikely(map_remaining < len)) { @@ -639,22 +627,16 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, } sk_eat_skb(ssk, skb); - done = true; } WRITE_ONCE(tp->copied_seq, seq); more_data_avail = mptcp_subflow_data_available(ssk); - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) { - done = true; - break; - } } while (more_data_avail); - if (moved > 0) + if (ret) msk->last_data_recv = tcp_jiffies32; - *bytes += moved; - return done; + return ret; } static bool __mptcp_ofo_queue(struct mptcp_sock *msk) @@ -748,9 +730,9 @@ void __mptcp_error_report(struct sock *sk) static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) { struct sock *sk = (struct sock *)msk; - unsigned int moved = 0; + bool moved; - __mptcp_move_skbs_from_subflow(msk, ssk, &moved); + moved = __mptcp_move_skbs_from_subflow(msk, ssk); __mptcp_ofo_queue(msk); if (unlikely(ssk->sk_err)) { if (!sock_owned_by_user(sk)) @@ -766,7 +748,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) */ if (mptcp_pending_data_fin(sk, NULL)) mptcp_schedule_work(sk); - return moved > 0; + return moved; } static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk) @@ -781,10 +763,6 @@ static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) __mptcp_rcvbuf_update(sk, ssk); - /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - return; - /* Wake-up the reader only for in-sequence data */ if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); @@ -884,20 +862,6 @@ bool mptcp_schedule_work(struct sock *sk) return false; } -static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) -{ - struct mptcp_subflow_context *subflow; - - msk_owned_by_me(msk); - - mptcp_for_each_subflow(msk, subflow) { - if (READ_ONCE(subflow->data_avail)) - return mptcp_subflow_tcp_sock(subflow); - } - - return NULL; -} - static bool mptcp_skb_can_collapse_to(u64 write_seq, const struct sk_buff *skb, const struct mptcp_ext *mpext) @@ -2037,37 +2001,62 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) msk->rcvq_space.time = mstamp; } +static struct mptcp_subflow_context * +__mptcp_first_ready_from(struct mptcp_sock *msk, + struct mptcp_subflow_context *subflow) +{ + struct mptcp_subflow_context *start_subflow = subflow; + + while (!READ_ONCE(subflow->data_avail)) { + subflow = mptcp_next_subflow(msk, subflow); + if (subflow == start_subflow) + return NULL; + } + return subflow; +} + static bool __mptcp_move_skbs(struct sock *sk) { struct mptcp_subflow_context *subflow; struct mptcp_sock *msk = mptcp_sk(sk); - unsigned int moved = 0; - bool ret, done; + bool ret = false; + + if (list_empty(&msk->conn_list)) + return false; /* verify we can move any data from the subflow, eventually updating */ if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) mptcp_for_each_subflow(msk, subflow) __mptcp_rcvbuf_update(sk, subflow->tcp_sock); - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - return false; - - do { - struct sock *ssk = mptcp_subflow_recv_lookup(msk); + subflow = list_first_entry(&msk->conn_list, + struct mptcp_subflow_context, node); + for (;;) { + struct sock *ssk; bool slowpath; - if (unlikely(!ssk)) + /* + * As an optimization avoid traversing the subflows list + * and ev. acquiring the subflow socket lock before baling out + */ + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) break; - slowpath = lock_sock_fast(ssk); - done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved); + subflow = __mptcp_first_ready_from(msk, subflow); + if (!subflow) + break; + ssk = mptcp_subflow_tcp_sock(subflow); + slowpath = lock_sock_fast(ssk); + ret = __mptcp_move_skbs_from_subflow(msk, ssk) || ret; if (unlikely(ssk->sk_err)) __mptcp_error_report(sk); unlock_sock_fast(ssk, slowpath); - } while (!done); - ret = moved > 0 || __mptcp_ofo_queue(msk); + subflow = mptcp_next_subflow(msk, subflow); + } + + __mptcp_ofo_queue(msk); if (ret) mptcp_check_data_fin((struct sock *)msk); return ret; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a1a077bae7b6ec4fab5b266e2613acb145eb343f..ca65f8bff632ff806fe761f86e9aa065b0657d1e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -354,6 +354,8 @@ struct mptcp_sock { list_for_each_entry(__subflow, &((__msk)->conn_list), node) #define mptcp_for_each_subflow_safe(__msk, __subflow, __tmp) \ list_for_each_entry_safe(__subflow, __tmp, &((__msk)->conn_list), node) +#define mptcp_next_subflow(__msk, __subflow) \ + list_next_entry_circular(__subflow, &((__msk)->conn_list), node) extern struct genl_family mptcp_genl_family;