From patchwork Tue Nov 21 11:22:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pengcheng Yang X-Patchwork-Id: 13462843 X-Patchwork-Delegate: bpf@iogearbox.net Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from wangsu.com (unknown [180.101.34.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 73662CA; Tue, 21 Nov 2023 03:22:32 -0800 (PST) Received: from 102.wangsu.com (unknown [59.61.78.234]) by app2 (Coremail) with SMTP id SyJltAAXHQnxklxl4VxoAA--.27190S3; Tue, 21 Nov 2023 19:22:29 +0800 (CST) From: Pengcheng Yang To: John Fastabend , Jakub Sitnicki , Eric Dumazet , Jakub Kicinski , bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Pengcheng Yang Subject: [PATCH bpf-next v2 1/3] skmsg: Support to get the data length in ingress_msg Date: Tue, 21 Nov 2023 19:22:03 +0800 Message-Id: <1700565725-2706-2-git-send-email-yangpc@wangsu.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> References: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> X-CM-TRANSID: SyJltAAXHQnxklxl4VxoAA--.27190S3 X-Coremail-Antispam: 1UD129KBjvJXoWxAr43ZFy5JF1rtF47GFy7Awb_yoWrXFWfpF 1DAa1UAa1DArWxWwsayF45Aw1a9348Xa4jkry7Aw4SyrnYkr1rJF98Gr1YvF1rtr1kC3W7 trsFgFs0kF13WaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv21xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s1l8cAvFVAK 0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4 x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l 84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj64x0Y40En7xvr7AKxVWUJVW8JwAv7VCjz48v1sIE Y20_Gr4lYx0Ec7CjxVAajcxG14v26F4j6r4UJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2 IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkIecxEwVAFwVW8KwCF04k20xvY 0x0EwIxGrwCF04k20xvE74AGY7Cv6cx26r48MxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I 0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWU AVWUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcV CY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAF wI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa 7VU1Hq2tUUUUU== X-CM-SenderInfo: p1dqw1nf6zt0xjvxhudrp/ Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: X-Patchwork-Delegate: bpf@iogearbox.net Currently msg is queued in ingress_msg of the target psock on ingress redirect, without increment rcv_nxt. The size that user can read includes the data in receive_queue and ingress_msg. So we introduce sk_msg_queue_len() helper to get the data length in ingress_msg. Note that the msg_len does not include the data length of msg from recevive_queue via SK_PASS, as they increment rcv_nxt when received. Signed-off-by: Pengcheng Yang --- include/linux/skmsg.h | 26 ++++++++++++++++++++++++-- net/core/skmsg.c | 10 +++++++++- 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c1637515a8a4..423a5c28c606 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -47,6 +47,7 @@ struct sk_msg { u32 apply_bytes; u32 cork_bytes; u32 flags; + bool ingress_self; struct sk_buff *skb; struct sock *sk_redir; struct sock *sk; @@ -82,6 +83,7 @@ struct sk_psock { u32 apply_bytes; u32 cork_bytes; u32 eval; + u32 msg_len; bool redir_ingress; /* undefined if sk_redir is null */ struct sk_msg *cork; struct sk_psock_progs progs; @@ -311,9 +313,11 @@ static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { spin_lock_bh(&psock->ingress_lock); - if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { list_add_tail(&msg->list, &psock->ingress_msg); - else { + if (!msg->ingress_self) + WRITE_ONCE(psock->msg_len, psock->msg_len + msg->sg.size); + } else { sk_msg_free(psock->sk, msg); kfree(msg); } @@ -368,6 +372,24 @@ static inline void kfree_sk_msg(struct sk_msg *msg) kfree(msg); } +static inline void sk_msg_queue_consumed(struct sk_psock *psock, u32 len) +{ + WRITE_ONCE(psock->msg_len, psock->msg_len - len); +} + +static inline u32 sk_msg_queue_len(const struct sock *sk) +{ + struct sk_psock *psock; + u32 len = 0; + + rcu_read_lock(); + psock = sk_psock(sk); + if (psock) + len = READ_ONCE(psock->msg_len); + rcu_read_unlock(); + return len; +} + static inline void sk_psock_report_error(struct sk_psock *psock, int err) { struct sock *sk = psock->sk; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 6c31eefbd777..f46732a8ddc2 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -415,7 +415,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, struct iov_iter *iter = &msg->msg_iter; int peek = flags & MSG_PEEK; struct sk_msg *msg_rx; - int i, copied = 0; + int i, copied = 0, msg_copied = 0; msg_rx = sk_psock_peek_msg(psock); while (copied != len) { @@ -441,6 +441,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, } copied += copy; + if (!msg_rx->ingress_self) + msg_copied += copy; if (likely(!peek)) { sge->offset += copy; sge->length -= copy; @@ -481,6 +483,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, msg_rx = sk_psock_peek_msg(psock); } out: + if (likely(!peek) && msg_copied) + sk_msg_queue_consumed(psock, msg_copied); return copied; } EXPORT_SYMBOL_GPL(sk_msg_recvmsg); @@ -602,6 +606,7 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb if (unlikely(!msg)) return -EAGAIN; + msg->ingress_self = true; skb_set_owner_r(skb, sk); err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); if (err < 0) @@ -771,9 +776,12 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) { list_del(&msg->list); + if (!msg->ingress_self) + sk_msg_queue_consumed(psock, msg->sg.size); sk_msg_free(psock->sk, msg); kfree(msg); } + WARN_ON_ONCE(READ_ONCE(psock->msg_len) != 0); } static void __sk_psock_zap_ingress(struct sk_psock *psock) From patchwork Tue Nov 21 11:22:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pengcheng Yang X-Patchwork-Id: 13462844 X-Patchwork-Delegate: bpf@iogearbox.net Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from wangsu.com (unknown [180.101.34.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 99A0298; Tue, 21 Nov 2023 03:22:33 -0800 (PST) Received: from 102.wangsu.com (unknown [59.61.78.234]) by app2 (Coremail) with SMTP id SyJltAAXHQnxklxl4VxoAA--.27190S4; Tue, 21 Nov 2023 19:22:30 +0800 (CST) From: Pengcheng Yang To: John Fastabend , Jakub Sitnicki , Eric Dumazet , Jakub Kicinski , bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Pengcheng Yang Subject: [PATCH bpf-next v2 2/3] tcp: Add the data length in skmsg to SIOCINQ ioctl Date: Tue, 21 Nov 2023 19:22:04 +0800 Message-Id: <1700565725-2706-3-git-send-email-yangpc@wangsu.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> References: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> X-CM-TRANSID: SyJltAAXHQnxklxl4VxoAA--.27190S4 X-Coremail-Antispam: 1UD129KBjvdXoW7XF18KrW3Xr4xGFW5ZF48Crg_yoWfKFgE93 9rGF48G3yxWr1IvanFyFZ5t3WS9w18ur1fWF43Ca47ta4UJry5Crs3J3s8Crsrua9FkrZ5 WF93G3y3urya9jkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbI8Fc2x0x2IEx4CE42xK8VAvwI8IcIk0rVWrJVCq3wA2ocxC64kI II0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7 xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2 z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4 xG64xvF2IEw4CE5I8CrVC2j2WlYx0EF7xvrVAajcxG14v26r1j6r4UMcIj6x8ErcxFaVAv 8VW8GwAv7VCY1x0262k0Y48FwI0_Cr0_Gr1UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0x vY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc2xSY4AK67AK6r4kMxAIw28IcxkI 7VAKI48JMxAIw28IcVCjz48v1sIEY20_Gr4l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r12 6r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6x kF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AK xVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvj fUwL0eDUUUU X-CM-SenderInfo: p1dqw1nf6zt0xjvxhudrp/ Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: X-Patchwork-Delegate: bpf@iogearbox.net SIOCINQ ioctl returns the number unread bytes of the receive queue but does not include the ingress_msg queue. With the sk_msg redirect, an application may get a value 0 if it calls SIOCINQ ioctl before recv() to determine the readable size. Signed-off-by: Pengcheng Yang --- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3d3a24f79573..04da0684c397 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -267,6 +267,7 @@ #include #include #include +#include #include #include @@ -613,7 +614,7 @@ int tcp_ioctl(struct sock *sk, int cmd, int *karg) return -EINVAL; slow = lock_sock_fast(sk); - answ = tcp_inq(sk); + answ = tcp_inq(sk) + sk_msg_queue_len(sk); unlock_sock_fast(sk, slow); break; case SIOCATMARK: From patchwork Tue Nov 21 11:22:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pengcheng Yang X-Patchwork-Id: 13462845 X-Patchwork-Delegate: bpf@iogearbox.net Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from wangsu.com (unknown [180.101.34.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A6114CB; Tue, 21 Nov 2023 03:22:34 -0800 (PST) Received: from 102.wangsu.com (unknown [59.61.78.234]) by app2 (Coremail) with SMTP id SyJltAAXHQnxklxl4VxoAA--.27190S5; Tue, 21 Nov 2023 19:22:31 +0800 (CST) From: Pengcheng Yang To: John Fastabend , Jakub Sitnicki , Eric Dumazet , Jakub Kicinski , bpf@vger.kernel.org, netdev@vger.kernel.org Cc: Pengcheng Yang Subject: [PATCH bpf-next v2 3/3] tcp_diag: Add the data length in skmsg to rx_queue Date: Tue, 21 Nov 2023 19:22:05 +0800 Message-Id: <1700565725-2706-4-git-send-email-yangpc@wangsu.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> References: <1700565725-2706-1-git-send-email-yangpc@wangsu.com> X-CM-TRANSID: SyJltAAXHQnxklxl4VxoAA--.27190S5 X-Coremail-Antispam: 1UD129KBjvdXoWrZF4fAr13AF1fCr47Jr17Awb_yoW3Kwb_uw n7ZrW8W3srXr1xta1xZFZxJFyYk34IyFn5W3WS9a4qy34DJF9xuw4rXF98Ars7CanxCrZ5 ur1DJryUG34rujkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUbI8Fc2x0x2IEx4CE42xK8VAvwI8IcIk0rVWrJVCq3wA2ocxC64kI II0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7 xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2 z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4 xG64xvF2IEw4CE5I8CrVC2j2WlYx0EF7xvrVAajcxG14v26r1j6r4UMcIj6x8ErcxFaVAv 8VW8GwAv7VCY1x0262k0Y48FwI0_Cr0_Gr1UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0x vY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc2xSY4AK67AK6r4kMxAIw28IcxkI 7VAKI48JMxAIw28IcVCjz48v1sIEY20_Gr4l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r12 6r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6x kF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AK xVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvj fUwL0eDUUUU X-CM-SenderInfo: p1dqw1nf6zt0xjvxhudrp/ Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: X-Patchwork-Delegate: bpf@iogearbox.net In order to help us track the data length in ingress_msg when using sk_msg redirect. Signed-off-by: Pengcheng Yang --- net/ipv4/tcp_diag.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index 01b50fa79189..b22382820a4b 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -11,6 +11,7 @@ #include #include +#include #include #include @@ -28,6 +29,7 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, r->idiag_rqueue = max_t(int, READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq), 0); + r->idiag_rqueue += sk_msg_queue_len(sk); r->idiag_wqueue = READ_ONCE(tp->write_seq) - tp->snd_una; } if (info)