diff mbox series

9p: Avoid creating multiple slab caches with the same name

Message ID 20240807094725.2193423-1-pedro.falcato@gmail.com (mailing list archive)
State New
Headers show
Series 9p: Avoid creating multiple slab caches with the same name | expand

Commit Message

Pedro Falcato Aug. 7, 2024, 9:47 a.m. UTC
In the spirit of [1], avoid creating multiple slab caches with the same
name. Instead, add the dev_name into the mix.

[1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
 net/9p/client.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Oct. 18, 2024, 5:28 p.m. UTC | #1
On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote:
> In the spirit of [1], avoid creating multiple slab caches with the same
> name. Instead, add the dev_name into the mix.
> 
> [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/
> 
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
>  net/9p/client.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Can this get picked up to rc4 please?

It is causing regressions in my environment

#regzbot introduced: 4c39529663b9

Thanks,
Jason
Pedro Falcato Oct. 18, 2024, 6:36 p.m. UTC | #2
On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote:
> > In the spirit of [1], avoid creating multiple slab caches with the same
> > name. Instead, add the dev_name into the mix.
> >
> > [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > ---
> >  net/9p/client.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
>
> Can this get picked up to rc4 please?

FWIW, seems to have been picked up into 9pfs-next
(https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259).
Seems like we're just missing a PR to Linus?
Thorsten Leemhuis Oct. 18, 2024, 6:54 p.m. UTC | #3
[reinserting a bit of quotation below for Linus]

TLDR: Linus, I wonder if it would be best if you could merge the patch
at the start of this thread
(https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
), which can also be found as 79efebae4afc22 in -next if you prefer to
cherry-pick it from there. Either now or after 24 to 36 hours, which
would give Eric a chance to ACK/NACK this if he sees this mail.

For details see below.

On 18.10.24 20:36, Pedro Falcato wrote:
> On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>> On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote:
>>> In the spirit of [1], avoid creating multiple slab caches with the same
>>> name. Instead, add the dev_name into the mix.
>>>
>>> [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/
>>>
>>> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
>>> ---
>>>  net/9p/client.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> Can this get picked up to rc4 please?

Eric apparently has not much time or upstream work currently (but
recently showed up in a discussion about another regression:
https://lore.kernel.org/all/Zw-J0DdrCFLYpT5y@codewreck.org/ ).

>> It is causing regressions in my environment
>> #regzbot introduced: 4c39529663b9

If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate
cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing
Vlastimil, so he knows about this.

> FWIW, seems to have been picked up into 9pfs-next
> (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259).
> Seems like we're just missing a PR to Linus?

In that case: Linus, given the circumstances I wonder if it would be
best if you could merge the patch at the start of this thread
(https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
) directly, which can also be found as 79efebae4afc22 in -next if you
prefer to cherry-pick it from there  Either now or after 24 to 36 hours,
which would give Eric a chance to ACK/NACK this if he sees this mail.

Ciao, Thorsten
Vlastimil Babka Oct. 18, 2024, 9:38 p.m. UTC | #4
On 10/18/24 8:54 PM, Thorsten Leemhuis wrote:
> [reinserting a bit of quotation below for Linus]
> 
> TLDR: Linus, I wonder if it would be best if you could merge the patch
> at the start of this thread
> (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
> ), which can also be found as 79efebae4afc22 in -next if you prefer to
> cherry-pick it from there. Either now or after 24 to 36 hours, which
> would give Eric a chance to ACK/NACK this if he sees this mail.
> 
> For details see below.
> 
> On 18.10.24 20:36, Pedro Falcato wrote:
>> On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>
>>> On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote:
>>>> In the spirit of [1], avoid creating multiple slab caches with the same
>>>> name. Instead, add the dev_name into the mix.
>>>>
>>>> [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/
>>>>
>>>> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
>>>> ---
>>>>  net/9p/client.c | 10 +++++++++-
>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> Can this get picked up to rc4 please?
> 
> Eric apparently has not much time or upstream work currently (but
> recently showed up in a discussion about another regression:
> https://lore.kernel.org/all/Zw-J0DdrCFLYpT5y@codewreck.org/ ).
> 
>>> It is causing regressions in my environment
>>> #regzbot introduced: 4c39529663b9
> 
> If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate
> cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing
> Vlastimil, so he knows about this.
> 
>> FWIW, seems to have been picked up into 9pfs-next
>> (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259).
>> Seems like we're just missing a PR to Linus?

If was Dominique, not Eric, see:
https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/

Can you send it as mentioned in the email? :)

> In that case: Linus, given the circumstances I wonder if it would be
> best if you could merge the patch at the start of this thread
> (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
> ) directly, which can also be found as 79efebae4afc22 in -next if you
> prefer to cherry-pick it from there  Either now or after 24 to 36 hours,
> which would give Eric a chance to ACK/NACK this if he sees this mail.
> 
> Ciao, Thorsten
Dominique Martinet Oct. 19, 2024, 12:02 a.m. UTC | #5
Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200:
> > In that case: Linus, given the circumstances I wonder if it would be
> > best if you could merge the patch at the start of this thread
> > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
> > ) directly, which can also be found as 79efebae4afc22 in -next if you
> > prefer to cherry-pick it from there  Either now or after 24 to 36 hours,
> > which would give Eric a chance to ACK/NACK this if he sees this mail.

Sorry to everyone involved, I've just sent the merge request - I wasn't
much at computer in the past few weeks so wanted to wait until I got
back home to send it just in case, as I didn't realize this was a recent
regression that caused actual harm (it sounded like an old warning that
someone just recently happened to hit, and sounded easy enough to work
around locally if there is only one specific setup involved); I should
have sent the fix separately or at least corrected myself about the
schedule.

> >>> It is causing regressions in my environment
> >>> #regzbot introduced: 4c39529663b9
> > 
> > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate
> > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing
> > Vlastimil, so he knows about this.

(that might have been nice to have as a Fixes tag for eventual backport,
but at least that commit doesn't appear to have been picked up by stable
so it's probably fine as is)


Thanks,
Thorsten Leemhuis Oct. 19, 2024, 5:28 a.m. UTC | #6
On 18.10.24 23:38, Vlastimil Babka wrote:
> On 10/18/24 8:54 PM, Thorsten Leemhuis wrote:

>>> FWIW, seems to have been picked up into 9pfs-next
>>> (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259).
>>> Seems like we're just missing a PR to Linus?
> 
> If was Dominique, not Eric, see:
> https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/

Ahh, sorry, missed that. Apologies again.

> Can you send it as mentioned in the email? :)

Dominique: thx for sending that PR!

Ciao, Thorsten
Vlastimil Babka Oct. 21, 2024, 8:42 a.m. UTC | #7
On 10/19/24 02:02, Dominique Martinet wrote:
> Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200:
>> > In that case: Linus, given the circumstances I wonder if it would be
>> > best if you could merge the patch at the start of this thread
>> > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
>> > ) directly, which can also be found as 79efebae4afc22 in -next if you
>> > prefer to cherry-pick it from there  Either now or after 24 to 36 hours,
>> > which would give Eric a chance to ACK/NACK this if he sees this mail.
> 
> Sorry to everyone involved, I've just sent the merge request - I wasn't
> much at computer in the past few weeks so wanted to wait until I got
> back home to send it just in case, as I didn't realize this was a recent
> regression that caused actual harm (it sounded like an old warning that
> someone just recently happened to hit, and sounded easy enough to work
> around locally if there is only one specific setup involved); I should
> have sent the fix separately or at least corrected myself about the
> schedule.
> 
>> >>> It is causing regressions in my environment
>> >>> #regzbot introduced: 4c39529663b9
>> > 
>> > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate
>> > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing
>> > Vlastimil, so he knows about this.
> 
> (that might have been nice to have as a Fixes tag for eventual backport,
> but at least that commit doesn't appear to have been picked up by stable
> so it's probably fine as is)

Yeah it's missing the tag because I believe Pedro sent the 9p fix before
sending the new warning patch itself, so there was even no commit ID yet.
The plan was to introduce the warning only after all pre-existing in-tree
code that would triggers it was fixed. I just assumed the fix would be
mainlined before/at the same time as the warning itself, but forgot to
check. In any case if I see e.g. autosel picking the warning patch for
stable, I'll object.

Thanks,
Vlastimil

> 
> Thanks,
Omar Sandoval Oct. 21, 2024, 6:37 p.m. UTC | #8
On Mon, Oct 21, 2024 at 10:42:02AM +0200, Vlastimil Babka wrote:
> On 10/19/24 02:02, Dominique Martinet wrote:
> > Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200:
> >> > In that case: Linus, given the circumstances I wonder if it would be
> >> > best if you could merge the patch at the start of this thread
> >> > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/
> >> > ) directly, which can also be found as 79efebae4afc22 in -next if you
> >> > prefer to cherry-pick it from there  Either now or after 24 to 36 hours,
> >> > which would give Eric a chance to ACK/NACK this if he sees this mail.
> > 
> > Sorry to everyone involved, I've just sent the merge request - I wasn't
> > much at computer in the past few weeks so wanted to wait until I got
> > back home to send it just in case, as I didn't realize this was a recent
> > regression that caused actual harm (it sounded like an old warning that
> > someone just recently happened to hit, and sounded easy enough to work
> > around locally if there is only one specific setup involved); I should
> > have sent the fix separately or at least corrected myself about the
> > schedule.
> > 
> >> >>> It is causing regressions in my environment
> >> >>> #regzbot introduced: 4c39529663b9
> >> > 
> >> > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate
> >> > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing
> >> > Vlastimil, so he knows about this.
> > 
> > (that might have been nice to have as a Fixes tag for eventual backport,
> > but at least that commit doesn't appear to have been picked up by stable
> > so it's probably fine as is)
> 
> Yeah it's missing the tag because I believe Pedro sent the 9p fix before
> sending the new warning patch itself, so there was even no commit ID yet.
> The plan was to introduce the warning only after all pre-existing in-tree
> code that would triggers it was fixed. I just assumed the fix would be
> mainlined before/at the same time as the warning itself, but forgot to
> check. In any case if I see e.g. autosel picking the warning patch for
> stable, I'll object.
> 
> Thanks,
> Vlastimil

FYI, drgn's CI started getting EIO errors from
getdents("/sys/kernel/slab") that I bisected to this patch. The problem
is that dev_name can be an arbitrary string. In my case, it is
"/dev/root". This trips verify_dirent_name(), which fails if a filename
contains a slash.

This needs to use a different unique identifier. Maybe clnt->msize? But
then the kmem_caches will need to be shared between different mounts
using the same msize.

In any case, can this be reverted for now?

Thanks,
Omar
Linus Torvalds Oct. 21, 2024, 6:57 p.m. UTC | #9
On Mon, 21 Oct 2024 at 11:37, Omar Sandoval <osandov@osandov.com> wrote:
>
> FYI, drgn's CI started getting EIO errors from
> getdents("/sys/kernel/slab") that I bisected to this patch. The problem
> is that dev_name can be an arbitrary string. In my case, it is
> "/dev/root". This trips verify_dirent_name(), which fails if a filename
> contains a slash.

Bah. Does something silly like this fix it:

  --- a/net/9p/client.c
  +++ b/net/9p/client.c
  @@ -977,6 +977,7 @@ static int p9_client_version(struct p9_client *c)
   struct p9_client *p9_client_create(const char *dev_name, char *options)
   {
        int err;
  +     static atomic_t seqno = ATOMIC_INIT(0);
        struct p9_client *clnt;
        char *client_id;
        char *cache_name;
  @@ -1036,7 +1037,8 @@ struct p9_client *p9_client_create(const char
*dev_name, char *options)
        if (err)
                goto close_trans;

  -     cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name);
  +     cache_name = kasprintf(GFP_KERNEL,
  +             "9p-fcall-cache-%d", atomic_inc_return(&seqno));
        if (!cache_name) {
                err = -ENOMEM;
                goto close_trans;

(whitespace-damaged, but you get the idea)

                  Linus
Omar Sandoval Oct. 21, 2024, 8:06 p.m. UTC | #10
On Mon, Oct 21, 2024 at 11:57:38AM -0700, Linus Torvalds wrote:
> On Mon, 21 Oct 2024 at 11:37, Omar Sandoval <osandov@osandov.com> wrote:
> >
> > FYI, drgn's CI started getting EIO errors from
> > getdents("/sys/kernel/slab") that I bisected to this patch. The problem
> > is that dev_name can be an arbitrary string. In my case, it is
> > "/dev/root". This trips verify_dirent_name(), which fails if a filename
> > contains a slash.
> 
> Bah. Does something silly like this fix it:
> 
>   --- a/net/9p/client.c
>   +++ b/net/9p/client.c
>   @@ -977,6 +977,7 @@ static int p9_client_version(struct p9_client *c)
>    struct p9_client *p9_client_create(const char *dev_name, char *options)
>    {
>         int err;
>   +     static atomic_t seqno = ATOMIC_INIT(0);
>         struct p9_client *clnt;
>         char *client_id;
>         char *cache_name;
>   @@ -1036,7 +1037,8 @@ struct p9_client *p9_client_create(const char
> *dev_name, char *options)
>         if (err)
>                 goto close_trans;
> 
>   -     cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name);
>   +     cache_name = kasprintf(GFP_KERNEL,
>   +             "9p-fcall-cache-%d", atomic_inc_return(&seqno));
>         if (!cache_name) {
>                 err = -ENOMEM;
>                 goto close_trans;
> 
> (whitespace-damaged, but you get the idea)

Yup, that works here.

Tested-by: Omar Sandoval <osandov@fb.com>
Vlastimil Babka Oct. 21, 2024, 8:18 p.m. UTC | #11
On 10/21/24 20:37, Omar Sandoval wrote:
> On Mon, Oct 21, 2024 at 10:42:02AM +0200, Vlastimil Babka wrote:
> 
> FYI, drgn's CI started getting EIO errors from
> getdents("/sys/kernel/slab") that I bisected to this patch. The problem
> is that dev_name can be an arbitrary string. In my case, it is
> "/dev/root". This trips verify_dirent_name(), which fails if a filename
> contains a slash.
> 
> This needs to use a different unique identifier. Maybe clnt->msize? But
> then the kmem_caches will need to be shared between different mounts
> using the same msize.

Yep, Dominique mentioned that here too:

https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/

And yes, slab has internal merging of compatible caches enabled by default.
But since it can be disabled, that would indeed result in duplicate names
again if 9p mounts didn't track and reuse caches with same msize accross
mounts on its own.

Linus's suggestion seems like the easiest fix for now.

> In any case, can this be reverted for now?
> 
> Thanks,
> Omar
Linus Torvalds Oct. 21, 2024, 11 p.m. UTC | #12
On Mon, 21 Oct 2024 at 13:06, Omar Sandoval <osandov@osandov.com> wrote:
>
> Yup, that works here.
>
> Tested-by: Omar Sandoval <osandov@fb.com>

Ok, I changed the '%d' to a '%u' (because go big or go home), and applied it.

Hopefully I didn't screw anything up in the process.

              Linus
diff mbox series

Patch

diff --git a/net/9p/client.c b/net/9p/client.c
index 5cd94721d97..9e7b9151816 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -979,6 +979,7 @@  struct p9_client *p9_client_create(const char *dev_name, char *options)
 	int err;
 	struct p9_client *clnt;
 	char *client_id;
+	char *cache_name;
 
 	clnt = kmalloc(sizeof(*clnt), GFP_KERNEL);
 	if (!clnt)
@@ -1035,15 +1036,22 @@  struct p9_client *p9_client_create(const char *dev_name, char *options)
 	if (err)
 		goto close_trans;
 
+	cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name);
+	if (!cache_name) {
+		err = -ENOMEM;
+		goto close_trans;
+	}
+
 	/* P9_HDRSZ + 4 is the smallest packet header we can have that is
 	 * followed by data accessed from userspace by read
 	 */
 	clnt->fcall_cache =
-		kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize,
+		kmem_cache_create_usercopy(cache_name, clnt->msize,
 					   0, 0, P9_HDRSZ + 4,
 					   clnt->msize - (P9_HDRSZ + 4),
 					   NULL);
 
+	kfree(cache_name);
 	return clnt;
 
 close_trans: