diff mbox series

[v4,4/4] libceph: use sendpages_ok() instead of sendpage_ok()

Message ID 20240611063618.106485-5-ofir.gal@volumez.com (mailing list archive)
State New, archived
Headers show
Series [v4,1/4] net: introduce helper sendpages_ok() | expand

Commit Message

Ofir Gal June 11, 2024, 6:36 a.m. UTC
Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
order to enable MSG_SPLICE_PAGES, it check the first page of the
iterator, the iterator may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
first page is sendable, but one of the other pages isn't
skb_splice_from_iter() warns and aborts the data transfer.

Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
solves the issue.

Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
---
 net/ceph/messenger_v1.c | 2 +-
 net/ceph/messenger_v2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Ofir Gal July 16, 2024, 12:46 p.m. UTC | #1
Xiubo/Ilya please take a look

On 6/11/24 09:36, Ofir Gal wrote:
> Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
> order to enable MSG_SPLICE_PAGES, it check the first page of the
> iterator, the iterator may represent contiguous pages.
>
> MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
> pages it sends with sendpage_ok().
>
> When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
> first page is sendable, but one of the other pages isn't
> skb_splice_from_iter() warns and aborts the data transfer.
>
> Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
> solves the issue.
>
> Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
> ---
>  net/ceph/messenger_v1.c | 2 +-
>  net/ceph/messenger_v2.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> index 0cb61c76b9b8..a6788f284cd7 100644
> --- a/net/ceph/messenger_v1.c
> +++ b/net/ceph/messenger_v1.c
> @@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
>  	 * coalescing neighboring slab objects into a single frag which
>  	 * triggers one of hardened usercopy checks.
>  	 */
> -	if (sendpage_ok(page))
> +	if (sendpages_ok(page, size, offset))
>  		msg.msg_flags |= MSG_SPLICE_PAGES;
>  
>  	bvec_set_page(&bvec, page, size, offset);
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index bd608ffa0627..27f8f6c8eb60 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
>  		 * coalescing neighboring slab objects into a single frag
>  		 * which triggers one of hardened usercopy checks.
>  		 */
> -		if (sendpage_ok(bv.bv_page))
> +		if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
>  			msg.msg_flags |= MSG_SPLICE_PAGES;
>  		else
>  			msg.msg_flags &= ~MSG_SPLICE_PAGES;
Ilya Dryomov July 17, 2024, 8:26 p.m. UTC | #2
On Tue, Jul 16, 2024 at 2:46 PM Ofir Gal <ofir.gal@volumez.com> wrote:
>
> Xiubo/Ilya please take a look
>
> On 6/11/24 09:36, Ofir Gal wrote:
> > Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
> > order to enable MSG_SPLICE_PAGES, it check the first page of the
> > iterator, the iterator may represent contiguous pages.
> >
> > MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
> > pages it sends with sendpage_ok().
> >
> > When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
> > first page is sendable, but one of the other pages isn't
> > skb_splice_from_iter() warns and aborts the data transfer.
> >
> > Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
> > solves the issue.
> >
> > Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
> > ---
> >  net/ceph/messenger_v1.c | 2 +-
> >  net/ceph/messenger_v2.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> > index 0cb61c76b9b8..a6788f284cd7 100644
> > --- a/net/ceph/messenger_v1.c
> > +++ b/net/ceph/messenger_v1.c
> > @@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
> >        * coalescing neighboring slab objects into a single frag which
> >        * triggers one of hardened usercopy checks.
> >        */
> > -     if (sendpage_ok(page))
> > +     if (sendpages_ok(page, size, offset))
> >               msg.msg_flags |= MSG_SPLICE_PAGES;
> >
> >       bvec_set_page(&bvec, page, size, offset);
> > diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> > index bd608ffa0627..27f8f6c8eb60 100644
> > --- a/net/ceph/messenger_v2.c
> > +++ b/net/ceph/messenger_v2.c
> > @@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
> >                * coalescing neighboring slab objects into a single frag
> >                * which triggers one of hardened usercopy checks.
> >                */
> > -             if (sendpage_ok(bv.bv_page))
> > +             if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
> >                       msg.msg_flags |= MSG_SPLICE_PAGES;
> >               else
> >                       msg.msg_flags &= ~MSG_SPLICE_PAGES;
>

Hi Ofir,

Ceph should be fine as is -- there is an internal "cursor" abstraction
that that is limited to PAGE_SIZE chunks, using bvec_iter_bvec() instead
of mp_bvec_iter_bvec(), etc.  This means that both do_try_sendpage() and
ceph_tcp_sendpage() should be called only with

  page_off + len <= PAGE_SIZE

being true even if the page is contiguous (and that we lose out on the
potential performance benefit, of course...).

That said, if the plan is to remove sendpage_ok() so that it doesn't
accidentally grow new users who are unaware of this pitfall, consider
this

Acked-by: Ilya Dryomov <idryomov@gmail.com>

Thanks,

                Ilya
Sagi Grimberg July 17, 2024, 10:51 p.m. UTC | #3
On 17/07/2024 23:26, Ilya Dryomov wrote:
> On Tue, Jul 16, 2024 at 2:46 PM Ofir Gal <ofir.gal@volumez.com> wrote:
>> Xiubo/Ilya please take a look
>>
>> On 6/11/24 09:36, Ofir Gal wrote:
>>> Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
>>> order to enable MSG_SPLICE_PAGES, it check the first page of the
>>> iterator, the iterator may represent contiguous pages.
>>>
>>> MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
>>> pages it sends with sendpage_ok().
>>>
>>> When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
>>> first page is sendable, but one of the other pages isn't
>>> skb_splice_from_iter() warns and aborts the data transfer.
>>>
>>> Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
>>> solves the issue.
>>>
>>> Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
>>> ---
>>>   net/ceph/messenger_v1.c | 2 +-
>>>   net/ceph/messenger_v2.c | 2 +-
>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
>>> index 0cb61c76b9b8..a6788f284cd7 100644
>>> --- a/net/ceph/messenger_v1.c
>>> +++ b/net/ceph/messenger_v1.c
>>> @@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
>>>         * coalescing neighboring slab objects into a single frag which
>>>         * triggers one of hardened usercopy checks.
>>>         */
>>> -     if (sendpage_ok(page))
>>> +     if (sendpages_ok(page, size, offset))
>>>                msg.msg_flags |= MSG_SPLICE_PAGES;
>>>
>>>        bvec_set_page(&bvec, page, size, offset);
>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>> index bd608ffa0627..27f8f6c8eb60 100644
>>> --- a/net/ceph/messenger_v2.c
>>> +++ b/net/ceph/messenger_v2.c
>>> @@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
>>>                 * coalescing neighboring slab objects into a single frag
>>>                 * which triggers one of hardened usercopy checks.
>>>                 */
>>> -             if (sendpage_ok(bv.bv_page))
>>> +             if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
>>>                        msg.msg_flags |= MSG_SPLICE_PAGES;
>>>                else
>>>                        msg.msg_flags &= ~MSG_SPLICE_PAGES;
> Hi Ofir,
>
> Ceph should be fine as is -- there is an internal "cursor" abstraction
> that that is limited to PAGE_SIZE chunks, using bvec_iter_bvec() instead
> of mp_bvec_iter_bvec(), etc.  This means that both do_try_sendpage() and
> ceph_tcp_sendpage() should be called only with
>
>    page_off + len <= PAGE_SIZE
>
> being true even if the page is contiguous (and that we lose out on the
> potential performance benefit, of course...).
>
> That said, if the plan is to remove sendpage_ok() so that it doesn't
> accidentally grow new users who are unaware of this pitfall, consider
> this
>
> Acked-by: Ilya Dryomov <idryomov@gmail.com>

 From which tree should this go from? we can take it via the nvme tree, 
unless
someone else wants to queue it up...
Ofir Gal July 18, 2024, 8:31 a.m. UTC | #4
On 7/17/24 23:26, Ilya Dryomov wrote:
> On Tue, Jul 16, 2024 at 2:46 PM Ofir Gal <ofir.gal@volumez.com> wrote:
>>
>> Xiubo/Ilya please take a look
>>
>> On 6/11/24 09:36, Ofir Gal wrote:
>>> Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
>>> order to enable MSG_SPLICE_PAGES, it check the first page of the
>>> iterator, the iterator may represent contiguous pages.
>>>
>>> MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
>>> pages it sends with sendpage_ok().
>>>
>>> When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
>>> first page is sendable, but one of the other pages isn't
>>> skb_splice_from_iter() warns and aborts the data transfer.
>>>
>>> Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
>>> solves the issue.
>>>
>>> Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
>>> ---
>>>  net/ceph/messenger_v1.c | 2 +-
>>>  net/ceph/messenger_v2.c | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
>>> index 0cb61c76b9b8..a6788f284cd7 100644
>>> --- a/net/ceph/messenger_v1.c
>>> +++ b/net/ceph/messenger_v1.c
>>> @@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
>>>        * coalescing neighboring slab objects into a single frag which
>>>        * triggers one of hardened usercopy checks.
>>>        */
>>> -     if (sendpage_ok(page))
>>> +     if (sendpages_ok(page, size, offset))
>>>               msg.msg_flags |= MSG_SPLICE_PAGES;
>>>
>>>       bvec_set_page(&bvec, page, size, offset);
>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>> index bd608ffa0627..27f8f6c8eb60 100644
>>> --- a/net/ceph/messenger_v2.c
>>> +++ b/net/ceph/messenger_v2.c
>>> @@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
>>>                * coalescing neighboring slab objects into a single frag
>>>                * which triggers one of hardened usercopy checks.
>>>                */
>>> -             if (sendpage_ok(bv.bv_page))
>>> +             if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
>>>                       msg.msg_flags |= MSG_SPLICE_PAGES;
>>>               else
>>>                       msg.msg_flags &= ~MSG_SPLICE_PAGES;
>>
>
> Hi Ofir,
>
> Ceph should be fine as is -- there is an internal "cursor" abstraction
> that that is limited to PAGE_SIZE chunks, using bvec_iter_bvec() instead
> of mp_bvec_iter_bvec(), etc.  This means that both do_try_sendpage() and
> ceph_tcp_sendpage() should be called only with
>
>   page_off + len <= PAGE_SIZE
>
> being true even if the page is contiguous (and that we lose out on the
> potential performance benefit, of course...).
>
> That said, if the plan is to remove sendpage_ok() so that it doesn't
> accidentally grow new users who are unaware of this pitfall, consider
> this
>
> Acked-by: Ilya Dryomov <idryomov@gmail.com>
>
> Thanks,
>
>                 Ilya
I dont think the plan is to remove sendpage_ok() (unless someone says
otherwise). Im sending v5 without the libceph patch.

Thanks
diff mbox series

Patch

diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index 0cb61c76b9b8..a6788f284cd7 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -94,7 +94,7 @@  static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
 	 * coalescing neighboring slab objects into a single frag which
 	 * triggers one of hardened usercopy checks.
 	 */
-	if (sendpage_ok(page))
+	if (sendpages_ok(page, size, offset))
 		msg.msg_flags |= MSG_SPLICE_PAGES;
 
 	bvec_set_page(&bvec, page, size, offset);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index bd608ffa0627..27f8f6c8eb60 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -165,7 +165,7 @@  static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
 		 * coalescing neighboring slab objects into a single frag
 		 * which triggers one of hardened usercopy checks.
 		 */
-		if (sendpage_ok(bv.bv_page))
+		if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
 			msg.msg_flags |= MSG_SPLICE_PAGES;
 		else
 			msg.msg_flags &= ~MSG_SPLICE_PAGES;