diff mbox series

[net,v2,2/2] net: clarify SO_DEVMEM_DONTNEED behavior in documentation

Message ID 20241107210331.3044434-2-almasrymina@google.com (mailing list archive)
State Accepted
Commit 102d1404c385611c574498b1e0d1f3762e253359
Delegated to: Netdev Maintainers
Headers show
Series [net,v2,1/2] net: fix SO_DEVMEM_DONTNEED looping too long | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-11--21-00 (tests: 787)

Commit Message

Mina Almasry Nov. 7, 2024, 9:03 p.m. UTC
Document new behavior when the number of frags passed is too big.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 Documentation/networking/devmem.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Stanislav Fomichev Nov. 8, 2024, 1:30 a.m. UTC | #1
On 11/07, Mina Almasry wrote:
> Document new behavior when the number of frags passed is too big.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>  Documentation/networking/devmem.rst | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> index a55bf21f671c..d95363645331 100644
> --- a/Documentation/networking/devmem.rst
> +++ b/Documentation/networking/devmem.rst
> @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
>  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
>  and will lead to packet drops.
>  
> +The user must pass no more than 128 tokens, with no more than 1024 total frags
> +among the token->token_count across all the tokens. If the user provides more
> +than 1024 frags, the kernel will free up to 1024 frags and return early.
> +
> +The kernel returns the number of actual frags freed. The number of frags freed
> +can be less than the tokens provided by the user in case of:
> +

[..]

> +(a) an internal kernel leak bug.

If you're gonna respin, might be worth mentioning that the dmesg
will contain a warning in case of a leak?
Mina Almasry Nov. 8, 2024, 1:40 a.m. UTC | #2
On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 11/07, Mina Almasry wrote:
> > Document new behavior when the number of frags passed is too big.
> >
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > ---
> >  Documentation/networking/devmem.rst | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > index a55bf21f671c..d95363645331 100644
> > --- a/Documentation/networking/devmem.rst
> > +++ b/Documentation/networking/devmem.rst
> > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> >  and will lead to packet drops.
> >
> > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > +among the token->token_count across all the tokens. If the user provides more
> > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > +
> > +The kernel returns the number of actual frags freed. The number of frags freed
> > +can be less than the tokens provided by the user in case of:
> > +
>
> [..]
>
> > +(a) an internal kernel leak bug.
>
> If you're gonna respin, might be worth mentioning that the dmesg
> will contain a warning in case of a leak?

We will not actually warn in the likely cases of leak.

We warn when we find an entry in the xarray that is not a net_iov, or
if napi_pp_put_page fails on that net_iov. Both are very unlikely to
happen honestly.

The likely 'leaks' are when we don't find the frag_id in the xarray.
We do not warn on that because the user can intentionally trigger the
warning with invalid input. If the user is actually giving valid input
and the warn still happens, likely a kernel bug like I mentioned in
another thread, but we still don't warn.
Stanislav Fomichev Nov. 8, 2024, 3:01 a.m. UTC | #3
On 11/07, Mina Almasry wrote:
> On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 11/07, Mina Almasry wrote:
> > > Document new behavior when the number of frags passed is too big.
> > >
> > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > ---
> > >  Documentation/networking/devmem.rst | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > index a55bf21f671c..d95363645331 100644
> > > --- a/Documentation/networking/devmem.rst
> > > +++ b/Documentation/networking/devmem.rst
> > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > >  and will lead to packet drops.
> > >
> > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > +among the token->token_count across all the tokens. If the user provides more
> > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > +
> > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > +can be less than the tokens provided by the user in case of:
> > > +
> >
> > [..]
> >
> > > +(a) an internal kernel leak bug.
> >
> > If you're gonna respin, might be worth mentioning that the dmesg
> > will contain a warning in case of a leak?
> 
> We will not actually warn in the likely cases of leak.
> 
> We warn when we find an entry in the xarray that is not a net_iov, or
> if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> happen honestly.
> 
> The likely 'leaks' are when we don't find the frag_id in the xarray.
> We do not warn on that because the user can intentionally trigger the
> warning with invalid input. If the user is actually giving valid input
> and the warn still happens, likely a kernel bug like I mentioned in
> another thread, but we still don't warn.

In this case, maybe don't mention the leaks at all? If it's not
actionable, not sure how it helps?
Mina Almasry Nov. 8, 2024, 4:30 p.m. UTC | #4
On Thu, Nov 7, 2024 at 7:01 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 11/07, Mina Almasry wrote:
> > On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > >
> > > On 11/07, Mina Almasry wrote:
> > > > Document new behavior when the number of frags passed is too big.
> > > >
> > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > > ---
> > > >  Documentation/networking/devmem.rst | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > > index a55bf21f671c..d95363645331 100644
> > > > --- a/Documentation/networking/devmem.rst
> > > > +++ b/Documentation/networking/devmem.rst
> > > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > > >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > > >  and will lead to packet drops.
> > > >
> > > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > > +among the token->token_count across all the tokens. If the user provides more
> > > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > > +
> > > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > > +can be less than the tokens provided by the user in case of:
> > > > +
> > >
> > > [..]
> > >
> > > > +(a) an internal kernel leak bug.
> > >
> > > If you're gonna respin, might be worth mentioning that the dmesg
> > > will contain a warning in case of a leak?
> >
> > We will not actually warn in the likely cases of leak.
> >
> > We warn when we find an entry in the xarray that is not a net_iov, or
> > if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> > happen honestly.
> >
> > The likely 'leaks' are when we don't find the frag_id in the xarray.
> > We do not warn on that because the user can intentionally trigger the
> > warning with invalid input. If the user is actually giving valid input
> > and the warn still happens, likely a kernel bug like I mentioned in
> > another thread, but we still don't warn.
>
> In this case, maybe don't mention the leaks at all? If it's not
> actionable, not sure how it helps?

It's good to explain what the return code of the setsockopt means, and
when it would be less than the number of passed in tokens.

Also it's not really 'not actionable'. I expect serious users of
devmem tcp to log such leaks in metrics and try to root cause the
userspace or kernel bug causing them if they happen.
Stanislav Fomichev Nov. 8, 2024, 6:07 p.m. UTC | #5
On 11/08, Mina Almasry wrote:
> On Thu, Nov 7, 2024 at 7:01 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 11/07, Mina Almasry wrote:
> > > On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > >
> > > > On 11/07, Mina Almasry wrote:
> > > > > Document new behavior when the number of frags passed is too big.
> > > > >
> > > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > > > ---
> > > > >  Documentation/networking/devmem.rst | 9 +++++++++
> > > > >  1 file changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > > > index a55bf21f671c..d95363645331 100644
> > > > > --- a/Documentation/networking/devmem.rst
> > > > > +++ b/Documentation/networking/devmem.rst
> > > > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > > > >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > > > >  and will lead to packet drops.
> > > > >
> > > > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > > > +among the token->token_count across all the tokens. If the user provides more
> > > > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > > > +
> > > > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > > > +can be less than the tokens provided by the user in case of:
> > > > > +
> > > >
> > > > [..]
> > > >
> > > > > +(a) an internal kernel leak bug.
> > > >
> > > > If you're gonna respin, might be worth mentioning that the dmesg
> > > > will contain a warning in case of a leak?
> > >
> > > We will not actually warn in the likely cases of leak.
> > >
> > > We warn when we find an entry in the xarray that is not a net_iov, or
> > > if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> > > happen honestly.
> > >
> > > The likely 'leaks' are when we don't find the frag_id in the xarray.
> > > We do not warn on that because the user can intentionally trigger the
> > > warning with invalid input. If the user is actually giving valid input
> > > and the warn still happens, likely a kernel bug like I mentioned in
> > > another thread, but we still don't warn.
> >
> > In this case, maybe don't mention the leaks at all? If it's not
> > actionable, not sure how it helps?
> 
> It's good to explain what the return code of the setsockopt means, and
> when it would be less than the number of passed in tokens.
> 
> Also it's not really 'not actionable'. I expect serious users of
> devmem tcp to log such leaks in metrics and try to root cause the
> userspace or kernel bug causing them if they happen.

Right now it reads like both (a) and (b) have a similar probability. Maybe
even (a) is more probable because you mention it first? In theory, any syscall
can have a bug in it where it returns something bogus, so maybe at least
downplay the 'leak' part a bit? "In the extremely rare cases, kernel
might free less frags than requested .... "

Imagine a situation where the user inadvertently tries to free the same token
twice or something and gets the unexpected return value. Why? Might be
the kernel leak, right?

From the POW of the kernel, the most probable cases where we return
less tokens are:
1. user gave us more than 1024
2. user gave us incorrect tokens
...
99. kernel is full of bugs and we lost the frag
Mina Almasry Nov. 8, 2024, 6:45 p.m. UTC | #6
On Fri, Nov 8, 2024 at 10:07 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 11/08, Mina Almasry wrote:
> > On Thu, Nov 7, 2024 at 7:01 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > >
> > > On 11/07, Mina Almasry wrote:
> > > > On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > > >
> > > > > On 11/07, Mina Almasry wrote:
> > > > > > Document new behavior when the number of frags passed is too big.
> > > > > >
> > > > > > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > > > > > ---
> > > > > >  Documentation/networking/devmem.rst | 9 +++++++++
> > > > > >  1 file changed, 9 insertions(+)
> > > > > >
> > > > > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > > > > index a55bf21f671c..d95363645331 100644
> > > > > > --- a/Documentation/networking/devmem.rst
> > > > > > +++ b/Documentation/networking/devmem.rst
> > > > > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > > > > >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > > > > >  and will lead to packet drops.
> > > > > >
> > > > > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > > > > +among the token->token_count across all the tokens. If the user provides more
> > > > > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > > > > +
> > > > > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > > > > +can be less than the tokens provided by the user in case of:
> > > > > > +
> > > > >
> > > > > [..]
> > > > >
> > > > > > +(a) an internal kernel leak bug.
> > > > >
> > > > > If you're gonna respin, might be worth mentioning that the dmesg
> > > > > will contain a warning in case of a leak?
> > > >
> > > > We will not actually warn in the likely cases of leak.
> > > >
> > > > We warn when we find an entry in the xarray that is not a net_iov, or
> > > > if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> > > > happen honestly.
> > > >
> > > > The likely 'leaks' are when we don't find the frag_id in the xarray.
> > > > We do not warn on that because the user can intentionally trigger the
> > > > warning with invalid input. If the user is actually giving valid input
> > > > and the warn still happens, likely a kernel bug like I mentioned in
> > > > another thread, but we still don't warn.
> > >
> > > In this case, maybe don't mention the leaks at all? If it's not
> > > actionable, not sure how it helps?
> >
> > It's good to explain what the return code of the setsockopt means, and
> > when it would be less than the number of passed in tokens.
> >
> > Also it's not really 'not actionable'. I expect serious users of
> > devmem tcp to log such leaks in metrics and try to root cause the
> > userspace or kernel bug causing them if they happen.
>
> Right now it reads like both (a) and (b) have a similar probability. Maybe
> even (a) is more probable because you mention it first? In theory, any syscall
> can have a bug in it where it returns something bogus, so maybe at least
> downplay the 'leak' part a bit? "In the extremely rare cases, kernel
> might free less frags than requested .... "
>
> Imagine a situation where the user inadvertently tries to free the same token
> twice or something and gets the unexpected return value. Why? Might be
> the kernel leak, right?
>
> From the POW of the kernel, the most probable cases where we return
> less tokens are:
> 1. user gave us more than 1024
> 2. user gave us incorrect tokens
> ...
> 99. kernel is full of bugs and we lost the frag

The current wording doesn't make any comment about probability. More
information is better than less. I don't see a strong reason to omit
information. I think the docs are better now and will be improved
further in the future. Lets not bike shed too much on docs wording.
It's painfully obvious invalid input is more likely not subtle kernel
bugs IMO without calling out.
diff mbox series

Patch

diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
index a55bf21f671c..d95363645331 100644
--- a/Documentation/networking/devmem.rst
+++ b/Documentation/networking/devmem.rst
@@ -225,6 +225,15 @@  The user must ensure the tokens are returned to the kernel in a timely manner.
 Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
 and will lead to packet drops.
 
+The user must pass no more than 128 tokens, with no more than 1024 total frags
+among the token->token_count across all the tokens. If the user provides more
+than 1024 frags, the kernel will free up to 1024 frags and return early.
+
+The kernel returns the number of actual frags freed. The number of frags freed
+can be less than the tokens provided by the user in case of:
+
+(a) an internal kernel leak bug.
+(b) the user passed more than 1024 frags.
 
 Implementation & Caveats
 ========================