Message ID | 20240807094725.2193423-1-pedro.falcato@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | 9p: Avoid creating multiple slab caches with the same name | expand |
On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote: > In the spirit of [1], avoid creating multiple slab caches with the same > name. Instead, add the dev_name into the mix. > > [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/ > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> > --- > net/9p/client.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) Can this get picked up to rc4 please? It is causing regressions in my environment #regzbot introduced: 4c39529663b9 Thanks, Jason
On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote: > > In the spirit of [1], avoid creating multiple slab caches with the same > > name. Instead, add the dev_name into the mix. > > > > [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/ > > > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> > > --- > > net/9p/client.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > Can this get picked up to rc4 please? FWIW, seems to have been picked up into 9pfs-next (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259). Seems like we're just missing a PR to Linus?
[reinserting a bit of quotation below for Linus] TLDR: Linus, I wonder if it would be best if you could merge the patch at the start of this thread (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ ), which can also be found as 79efebae4afc22 in -next if you prefer to cherry-pick it from there. Either now or after 24 to 36 hours, which would give Eric a chance to ACK/NACK this if he sees this mail. For details see below. On 18.10.24 20:36, Pedro Falcato wrote: > On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote: >> >> On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote: >>> In the spirit of [1], avoid creating multiple slab caches with the same >>> name. Instead, add the dev_name into the mix. >>> >>> [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/ >>> >>> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> >>> --- >>> net/9p/client.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> Can this get picked up to rc4 please? Eric apparently has not much time or upstream work currently (but recently showed up in a discussion about another regression: https://lore.kernel.org/all/Zw-J0DdrCFLYpT5y@codewreck.org/ ). >> It is causing regressions in my environment >> #regzbot introduced: 4c39529663b9 If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing Vlastimil, so he knows about this. > FWIW, seems to have been picked up into 9pfs-next > (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259). > Seems like we're just missing a PR to Linus? In that case: Linus, given the circumstances I wonder if it would be best if you could merge the patch at the start of this thread (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ ) directly, which can also be found as 79efebae4afc22 in -next if you prefer to cherry-pick it from there Either now or after 24 to 36 hours, which would give Eric a chance to ACK/NACK this if he sees this mail. Ciao, Thorsten
On 10/18/24 8:54 PM, Thorsten Leemhuis wrote: > [reinserting a bit of quotation below for Linus] > > TLDR: Linus, I wonder if it would be best if you could merge the patch > at the start of this thread > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ > ), which can also be found as 79efebae4afc22 in -next if you prefer to > cherry-pick it from there. Either now or after 24 to 36 hours, which > would give Eric a chance to ACK/NACK this if he sees this mail. > > For details see below. > > On 18.10.24 20:36, Pedro Falcato wrote: >> On Fri, Oct 18, 2024 at 6:28 PM Jason Gunthorpe <jgg@nvidia.com> wrote: >>> >>> On Wed, Aug 07, 2024 at 10:47:25AM +0100, Pedro Falcato wrote: >>>> In the spirit of [1], avoid creating multiple slab caches with the same >>>> name. Instead, add the dev_name into the mix. >>>> >>>> [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/ >>>> >>>> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> >>>> --- >>>> net/9p/client.c | 10 +++++++++- >>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> Can this get picked up to rc4 please? > > Eric apparently has not much time or upstream work currently (but > recently showed up in a discussion about another regression: > https://lore.kernel.org/all/Zw-J0DdrCFLYpT5y@codewreck.org/ ). > >>> It is causing regressions in my environment >>> #regzbot introduced: 4c39529663b9 > > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing > Vlastimil, so he knows about this. > >> FWIW, seems to have been picked up into 9pfs-next >> (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259). >> Seems like we're just missing a PR to Linus? If was Dominique, not Eric, see: https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/ Can you send it as mentioned in the email? :) > In that case: Linus, given the circumstances I wonder if it would be > best if you could merge the patch at the start of this thread > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ > ) directly, which can also be found as 79efebae4afc22 in -next if you > prefer to cherry-pick it from there Either now or after 24 to 36 hours, > which would give Eric a chance to ACK/NACK this if he sees this mail. > > Ciao, Thorsten
Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200: > > In that case: Linus, given the circumstances I wonder if it would be > > best if you could merge the patch at the start of this thread > > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ > > ) directly, which can also be found as 79efebae4afc22 in -next if you > > prefer to cherry-pick it from there Either now or after 24 to 36 hours, > > which would give Eric a chance to ACK/NACK this if he sees this mail. Sorry to everyone involved, I've just sent the merge request - I wasn't much at computer in the past few weeks so wanted to wait until I got back home to send it just in case, as I didn't realize this was a recent regression that caused actual harm (it sounded like an old warning that someone just recently happened to hit, and sounded easy enough to work around locally if there is only one specific setup involved); I should have sent the fix separately or at least corrected myself about the schedule. > >>> It is causing regressions in my environment > >>> #regzbot introduced: 4c39529663b9 > > > > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate > > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing > > Vlastimil, so he knows about this. (that might have been nice to have as a Fixes tag for eventual backport, but at least that commit doesn't appear to have been picked up by stable so it's probably fine as is) Thanks,
On 18.10.24 23:38, Vlastimil Babka wrote: > On 10/18/24 8:54 PM, Thorsten Leemhuis wrote: >>> FWIW, seems to have been picked up into 9pfs-next >>> (https://github.com/martinetd/linux/commit/79efebae4afc2221fa814c3cae001bede66ab259). >>> Seems like we're just missing a PR to Linus? > > If was Dominique, not Eric, see: > https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/ Ahh, sorry, missed that. Apologies again. > Can you send it as mentioned in the email? :) Dominique: thx for sending that PR! Ciao, Thorsten
On 10/19/24 02:02, Dominique Martinet wrote: > Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200: >> > In that case: Linus, given the circumstances I wonder if it would be >> > best if you could merge the patch at the start of this thread >> > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ >> > ) directly, which can also be found as 79efebae4afc22 in -next if you >> > prefer to cherry-pick it from there Either now or after 24 to 36 hours, >> > which would give Eric a chance to ACK/NACK this if he sees this mail. > > Sorry to everyone involved, I've just sent the merge request - I wasn't > much at computer in the past few weeks so wanted to wait until I got > back home to send it just in case, as I didn't realize this was a recent > regression that caused actual harm (it sounded like an old warning that > someone just recently happened to hit, and sounded easy enough to work > around locally if there is only one specific setup involved); I should > have sent the fix separately or at least corrected myself about the > schedule. > >> >>> It is causing regressions in my environment >> >>> #regzbot introduced: 4c39529663b9 >> > >> > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate >> > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing >> > Vlastimil, so he knows about this. > > (that might have been nice to have as a Fixes tag for eventual backport, > but at least that commit doesn't appear to have been picked up by stable > so it's probably fine as is) Yeah it's missing the tag because I believe Pedro sent the 9p fix before sending the new warning patch itself, so there was even no commit ID yet. The plan was to introduce the warning only after all pre-existing in-tree code that would triggers it was fixed. I just assumed the fix would be mainlined before/at the same time as the warning itself, but forgot to check. In any case if I see e.g. autosel picking the warning patch for stable, I'll object. Thanks, Vlastimil > > Thanks,
On Mon, Oct 21, 2024 at 10:42:02AM +0200, Vlastimil Babka wrote: > On 10/19/24 02:02, Dominique Martinet wrote: > > Vlastimil Babka wrote on Fri, Oct 18, 2024 at 11:38:04PM +0200: > >> > In that case: Linus, given the circumstances I wonder if it would be > >> > best if you could merge the patch at the start of this thread > >> > (https://lore.kernel.org/all/20240807094725.2193423-1-pedro.falcato@gmail.com/ > >> > ) directly, which can also be found as 79efebae4afc22 in -next if you > >> > prefer to cherry-pick it from there Either now or after 24 to 36 hours, > >> > which would give Eric a chance to ACK/NACK this if he sees this mail. > > > > Sorry to everyone involved, I've just sent the merge request - I wasn't > > much at computer in the past few weeks so wanted to wait until I got > > back home to send it just in case, as I didn't realize this was a recent > > regression that caused actual harm (it sounded like an old warning that > > someone just recently happened to hit, and sounded easy enough to work > > around locally if there is only one specific setup involved); I should > > have sent the fix separately or at least corrected myself about the > > schedule. > > > >> >>> It is causing regressions in my environment > >> >>> #regzbot introduced: 4c39529663b9 > >> > > >> > If anyone wonders, that is 4c39529663b931 ("slab: Warn on duplicate > >> > cache names when DEBUG_VM=y") [v6.12-rc1]. That's also why I'm CCing > >> > Vlastimil, so he knows about this. > > > > (that might have been nice to have as a Fixes tag for eventual backport, > > but at least that commit doesn't appear to have been picked up by stable > > so it's probably fine as is) > > Yeah it's missing the tag because I believe Pedro sent the 9p fix before > sending the new warning patch itself, so there was even no commit ID yet. > The plan was to introduce the warning only after all pre-existing in-tree > code that would triggers it was fixed. I just assumed the fix would be > mainlined before/at the same time as the warning itself, but forgot to > check. In any case if I see e.g. autosel picking the warning patch for > stable, I'll object. > > Thanks, > Vlastimil FYI, drgn's CI started getting EIO errors from getdents("/sys/kernel/slab") that I bisected to this patch. The problem is that dev_name can be an arbitrary string. In my case, it is "/dev/root". This trips verify_dirent_name(), which fails if a filename contains a slash. This needs to use a different unique identifier. Maybe clnt->msize? But then the kmem_caches will need to be shared between different mounts using the same msize. In any case, can this be reverted for now? Thanks, Omar
On Mon, 21 Oct 2024 at 11:37, Omar Sandoval <osandov@osandov.com> wrote: > > FYI, drgn's CI started getting EIO errors from > getdents("/sys/kernel/slab") that I bisected to this patch. The problem > is that dev_name can be an arbitrary string. In my case, it is > "/dev/root". This trips verify_dirent_name(), which fails if a filename > contains a slash. Bah. Does something silly like this fix it: --- a/net/9p/client.c +++ b/net/9p/client.c @@ -977,6 +977,7 @@ static int p9_client_version(struct p9_client *c) struct p9_client *p9_client_create(const char *dev_name, char *options) { int err; + static atomic_t seqno = ATOMIC_INIT(0); struct p9_client *clnt; char *client_id; char *cache_name; @@ -1036,7 +1037,8 @@ struct p9_client *p9_client_create(const char *dev_name, char *options) if (err) goto close_trans; - cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name); + cache_name = kasprintf(GFP_KERNEL, + "9p-fcall-cache-%d", atomic_inc_return(&seqno)); if (!cache_name) { err = -ENOMEM; goto close_trans; (whitespace-damaged, but you get the idea) Linus
On Mon, Oct 21, 2024 at 11:57:38AM -0700, Linus Torvalds wrote: > On Mon, 21 Oct 2024 at 11:37, Omar Sandoval <osandov@osandov.com> wrote: > > > > FYI, drgn's CI started getting EIO errors from > > getdents("/sys/kernel/slab") that I bisected to this patch. The problem > > is that dev_name can be an arbitrary string. In my case, it is > > "/dev/root". This trips verify_dirent_name(), which fails if a filename > > contains a slash. > > Bah. Does something silly like this fix it: > > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -977,6 +977,7 @@ static int p9_client_version(struct p9_client *c) > struct p9_client *p9_client_create(const char *dev_name, char *options) > { > int err; > + static atomic_t seqno = ATOMIC_INIT(0); > struct p9_client *clnt; > char *client_id; > char *cache_name; > @@ -1036,7 +1037,8 @@ struct p9_client *p9_client_create(const char > *dev_name, char *options) > if (err) > goto close_trans; > > - cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name); > + cache_name = kasprintf(GFP_KERNEL, > + "9p-fcall-cache-%d", atomic_inc_return(&seqno)); > if (!cache_name) { > err = -ENOMEM; > goto close_trans; > > (whitespace-damaged, but you get the idea) Yup, that works here. Tested-by: Omar Sandoval <osandov@fb.com>
On 10/21/24 20:37, Omar Sandoval wrote: > On Mon, Oct 21, 2024 at 10:42:02AM +0200, Vlastimil Babka wrote: > > FYI, drgn's CI started getting EIO errors from > getdents("/sys/kernel/slab") that I bisected to this patch. The problem > is that dev_name can be an arbitrary string. In my case, it is > "/dev/root". This trips verify_dirent_name(), which fails if a filename > contains a slash. > > This needs to use a different unique identifier. Maybe clnt->msize? But > then the kmem_caches will need to be shared between different mounts > using the same msize. Yep, Dominique mentioned that here too: https://lore.kernel.org/all/ZvBIl8b9RRK9jgtJ@codewreck.org/ And yes, slab has internal merging of compatible caches enabled by default. But since it can be disabled, that would indeed result in duplicate names again if 9p mounts didn't track and reuse caches with same msize accross mounts on its own. Linus's suggestion seems like the easiest fix for now. > In any case, can this be reverted for now? > > Thanks, > Omar
On Mon, 21 Oct 2024 at 13:06, Omar Sandoval <osandov@osandov.com> wrote: > > Yup, that works here. > > Tested-by: Omar Sandoval <osandov@fb.com> Ok, I changed the '%d' to a '%u' (because go big or go home), and applied it. Hopefully I didn't screw anything up in the process. Linus
diff --git a/net/9p/client.c b/net/9p/client.c index 5cd94721d97..9e7b9151816 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -979,6 +979,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options) int err; struct p9_client *clnt; char *client_id; + char *cache_name; clnt = kmalloc(sizeof(*clnt), GFP_KERNEL); if (!clnt) @@ -1035,15 +1036,22 @@ struct p9_client *p9_client_create(const char *dev_name, char *options) if (err) goto close_trans; + cache_name = kasprintf(GFP_KERNEL, "9p-fcall-cache-%s", dev_name); + if (!cache_name) { + err = -ENOMEM; + goto close_trans; + } + /* P9_HDRSZ + 4 is the smallest packet header we can have that is * followed by data accessed from userspace by read */ clnt->fcall_cache = - kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize, + kmem_cache_create_usercopy(cache_name, clnt->msize, 0, 0, P9_HDRSZ + 4, clnt->msize - (P9_HDRSZ + 4), NULL); + kfree(cache_name); return clnt; close_trans:
In the spirit of [1], avoid creating multiple slab caches with the same name. Instead, add the dev_name into the mix. [1]: https://lore.kernel.org/all/20240807090746.2146479-1-pedro.falcato@gmail.com/ Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> --- net/9p/client.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)