Message ID | 20220921114444.2247083-1-gregkh@linuxfoundation.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: mvpp2: debugfs: fix problem with previous memory leak fix | expand |
On Wed, Sep 21, 2022 at 01:44:44PM +0200, Greg Kroah-Hartman wrote: > In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > debugfs_lookup()"), if the module is unloaded, the directory will still > be present if the module is loaded again and creating the directory will > fail, causing the creation of additional child debugfs entries for the > individual devices to fail. > > As this module never cleaned up the root directory it created, even when > loaded, and unloading/loading a module is not a normal operation, none > of would normally happen. > > To clean all of this up, use a tiny reference counted structure to hold > the root debugfs directory for the driver, and then clean it up when the > last user of it is removed from the system. This should resolve the > previously reported problems, and the original memory leak that > fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > debugfs_lookup()") was attempting to fix. For the record... I have a better fix for this, but I haven't been able to get it into a state suitable for submission yet. http://www.home.armlinux.org.uk/~rmk/misc/mvpp2-debugfs.diff Not yet against the net tree. Might have time tomorrow to do that, not sure at the moment. Medical stuff is getting in the way. :(
Hi, Thank you both for the patches. czw., 22 wrz 2022 o 18:05 Russell King (Oracle) <linux@armlinux.org.uk> napisał(a): > > On Wed, Sep 21, 2022 at 01:44:44PM +0200, Greg Kroah-Hartman wrote: > > In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > debugfs_lookup()"), if the module is unloaded, the directory will still > > be present if the module is loaded again and creating the directory will > > fail, causing the creation of additional child debugfs entries for the > > individual devices to fail. > > > > As this module never cleaned up the root directory it created, even when > > loaded, and unloading/loading a module is not a normal operation, none > > of would normally happen. > > > > To clean all of this up, use a tiny reference counted structure to hold > > the root debugfs directory for the driver, and then clean it up when the > > last user of it is removed from the system. This should resolve the > > previously reported problems, and the original memory leak that > > fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > debugfs_lookup()") was attempting to fix. > > For the record... I have a better fix for this, but I haven't been able > to get it into a state suitable for submission yet. > > http://www.home.armlinux.org.uk/~rmk/misc/mvpp2-debugfs.diff > > Not yet against the net tree. Might have time tomorrow to do that, not > sure at the moment. Medical stuff is getting in the way. :( > I'd lean towards this version - it is a bit more compact. I'll try to test that tomorrow or during the weekend. Best regards, Marcin
Hi Russell, czw., 22 wrz 2022 o 19:08 Marcin Wojtas <mw@semihalf.com> napisał(a): > > Hi, > > Thank you both for the patches. > > > czw., 22 wrz 2022 o 18:05 Russell King (Oracle) > <linux@armlinux.org.uk> napisał(a): > > > > On Wed, Sep 21, 2022 at 01:44:44PM +0200, Greg Kroah-Hartman wrote: > > > In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > debugfs_lookup()"), if the module is unloaded, the directory will still > > > be present if the module is loaded again and creating the directory will > > > fail, causing the creation of additional child debugfs entries for the > > > individual devices to fail. > > > > > > As this module never cleaned up the root directory it created, even when > > > loaded, and unloading/loading a module is not a normal operation, none > > > of would normally happen. > > > > > > To clean all of this up, use a tiny reference counted structure to hold > > > the root debugfs directory for the driver, and then clean it up when the > > > last user of it is removed from the system. This should resolve the > > > previously reported problems, and the original memory leak that > > > fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > debugfs_lookup()") was attempting to fix. > > > > For the record... I have a better fix for this, but I haven't been able > > to get it into a state suitable for submission yet. > > > > http://www.home.armlinux.org.uk/~rmk/misc/mvpp2-debugfs.diff > > > > Not yet against the net tree. Might have time tomorrow to do that, not > > sure at the moment. Medical stuff is getting in the way. :( > > > > I'd lean towards this version - it is a bit more compact. I'll try to > test that tomorrow or during the weekend. > I improved the patch compile and work (tested on my CN913x board). Feel free to submit (if you wish, I can do it too - please let me know your preference): https://github.com/semihalf-wojtas-marcin/Linux-Kernel/commit/0abb75115ffb2772f595bb3346573e27e650018b Best regards, Marcin
On Sun, Sep 25, 2022 at 01:27:06AM +0200, Marcin Wojtas wrote: > Hi Russell, > > czw., 22 wrz 2022 o 19:08 Marcin Wojtas <mw@semihalf.com> napisał(a): > > > > Hi, > > > > Thank you both for the patches. > > > > > > czw., 22 wrz 2022 o 18:05 Russell King (Oracle) > > <linux@armlinux.org.uk> napisał(a): > > > > > > On Wed, Sep 21, 2022 at 01:44:44PM +0200, Greg Kroah-Hartman wrote: > > > > In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > > debugfs_lookup()"), if the module is unloaded, the directory will still > > > > be present if the module is loaded again and creating the directory will > > > > fail, causing the creation of additional child debugfs entries for the > > > > individual devices to fail. > > > > > > > > As this module never cleaned up the root directory it created, even when > > > > loaded, and unloading/loading a module is not a normal operation, none > > > > of would normally happen. > > > > > > > > To clean all of this up, use a tiny reference counted structure to hold > > > > the root debugfs directory for the driver, and then clean it up when the > > > > last user of it is removed from the system. This should resolve the > > > > previously reported problems, and the original memory leak that > > > > fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > > debugfs_lookup()") was attempting to fix. > > > > > > For the record... I have a better fix for this, but I haven't been able > > > to get it into a state suitable for submission yet. > > > > > > http://www.home.armlinux.org.uk/~rmk/misc/mvpp2-debugfs.diff > > > > > > Not yet against the net tree. Might have time tomorrow to do that, not > > > sure at the moment. Medical stuff is getting in the way. :( > > > > > > > I'd lean towards this version - it is a bit more compact. I'll try to > > test that tomorrow or during the weekend. > > > > I improved the patch compile and work (tested on my CN913x board). > Feel free to submit (if you wish, I can do it too - please let me know > your preference): > https://github.com/semihalf-wojtas-marcin/Linux-Kernel/commit/0abb75115ffb2772f595bb3346573e27e650018b I don't see what the compile fixes were in that - it looks like my patch ported onto current -net. Obvious changes: - moved mvpp2_dbgfs_exit() declaration from after mvpp2_dbgfs_cleanup() to before. - moved definition of mvpp2_root to the top of the file (as no effect on the code.) and the change to port it: - dropped my mvpp2_dbgfs_init() hunk (because it's different in -net) - removed static declaration of mvpp2_root in mvpp2_dbgfs_init() I'm not seeing any other changes. Note that Sasha has submitted a revert of Greg's original patch for mainline, so my original patch should apply as-is if that revert happens - and I don't see any compile issues with it.
On Sun, Sep 25, 2022 at 07:58:29AM +0100, Russell King (Oracle) wrote: > On Sun, Sep 25, 2022 at 01:27:06AM +0200, Marcin Wojtas wrote: > > Hi Russell, > > > > I improved the patch compile and work (tested on my CN913x board). > > Feel free to submit (if you wish, I can do it too - please let me know > > your preference): > > https://github.com/semihalf-wojtas-marcin/Linux-Kernel/commit/0abb75115ffb2772f595bb3346573e27e650018b > > I don't see what the compile fixes were in that - it looks like my patch > ported onto current -net. Obvious changes: > > - moved mvpp2_dbgfs_exit() declaration from after mvpp2_dbgfs_cleanup() > to before. > - moved definition of mvpp2_root to the top of the file (as no effect > on the code.) > > and the change to port it: > > - dropped my mvpp2_dbgfs_init() hunk (because it's different in -net) > - removed static declaration of mvpp2_root in mvpp2_dbgfs_init() > > I'm not seeing any other changes. > > Note that Sasha has submitted a revert of Greg's original patch for > mainline, so my original patch should apply as-is if that revert > happens - and I don't see any compile issues with it. On that - it seems the Stable kernel maintainers can't cope with being told not to apply a patch - we've had a big long discussion about it on IRC over the past few days. Sasha states that the mainline process is broken - and as long as Greg's original patch is in place, stable will repeatedly attempt to backport it no matter whether a proper fix is being worked on, whether maintainers are busy, and so on and so forth. The only way to stop stable backporting patches is to revert them in mainline - as I asked to happen in this case on the 13th. So it's all our fault for not reverting the patch. It's got nothing to do with stable maintainers unable to keep track of which patches they shouldn't be picking up. I maintain that the stable kernel process is totally broken due to this - it makes no allowance for whether maintainers can sort the problem out in a timely manner. Quite honestly, I now regard the stable kernel process as being utterly broken. In my mind, it's not a stable kernel. It's an unstable kernel with randomly applied patches that may or may not be appropriate. Certainly, stable-kernel-rules.rst is a total and utter joke - they aren't rules at all, the stable kernel maintainers don't have any rules about patches they backport. Anyway, I guess we're going to have to wait for Sasha's revert to be merged into -net first, which will make backporting our true and proper memory-leak fix a lot easier.
niedz., 25 wrz 2022 o 08:58 Russell King (Oracle) <linux@armlinux.org.uk> napisał(a): > > On Sun, Sep 25, 2022 at 01:27:06AM +0200, Marcin Wojtas wrote: > > Hi Russell, > > > > czw., 22 wrz 2022 o 19:08 Marcin Wojtas <mw@semihalf.com> napisał(a): > > > > > > Hi, > > > > > > Thank you both for the patches. > > > > > > > > > czw., 22 wrz 2022 o 18:05 Russell King (Oracle) > > > <linux@armlinux.org.uk> napisał(a): > > > > > > > > On Wed, Sep 21, 2022 at 01:44:44PM +0200, Greg Kroah-Hartman wrote: > > > > > In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > > > debugfs_lookup()"), if the module is unloaded, the directory will still > > > > > be present if the module is loaded again and creating the directory will > > > > > fail, causing the creation of additional child debugfs entries for the > > > > > individual devices to fail. > > > > > > > > > > As this module never cleaned up the root directory it created, even when > > > > > loaded, and unloading/loading a module is not a normal operation, none > > > > > of would normally happen. > > > > > > > > > > To clean all of this up, use a tiny reference counted structure to hold > > > > > the root debugfs directory for the driver, and then clean it up when the > > > > > last user of it is removed from the system. This should resolve the > > > > > previously reported problems, and the original memory leak that > > > > > fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using > > > > > debugfs_lookup()") was attempting to fix. > > > > > > > > For the record... I have a better fix for this, but I haven't been able > > > > to get it into a state suitable for submission yet. > > > > > > > > http://www.home.armlinux.org.uk/~rmk/misc/mvpp2-debugfs.diff > > > > > > > > Not yet against the net tree. Might have time tomorrow to do that, not > > > > sure at the moment. Medical stuff is getting in the way. :( > > > > > > > > > > I'd lean towards this version - it is a bit more compact. I'll try to > > > test that tomorrow or during the weekend. > > > > > > > I improved the patch compile and work (tested on my CN913x board). > > Feel free to submit (if you wish, I can do it too - please let me know > > your preference): > > https://github.com/semihalf-wojtas-marcin/Linux-Kernel/commit/0abb75115ffb2772f595bb3346573e27e650018b > > I don't see what the compile fixes were in that - it looks like my patch I overlooked drivers/net/ethernet/marvell/mvpp2/mvpp2.h diff (I applied patch manually), but I double-checked and it was there, so you are right. > ported onto current -net. Obvious changes: Yes, it's applied on top of net/net-next. > > - moved mvpp2_dbgfs_exit() declaration from after mvpp2_dbgfs_cleanup() > to before. > - moved definition of mvpp2_root to the top of the file (as no effect > on the code.) These are irrelevant. The most important part was: --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -7714,8 +7714,8 @@ module_init(mvpp2_driver_init); static void __exit mvpp2_driver_exit(void) { - mvpp2_dbgfs_exit(); platform_driver_unregister(&mvpp2_driver); + mvpp2_dbgfs_exit(); } module_exit(mvpp2_driver_exit); Otherwise, I observed NULL-pointer dereference when mvpp2_remove()->mvpp2_dbgfs_cleanup(). > > and the change to port it: > > - dropped my mvpp2_dbgfs_init() hunk (because it's different in -net) > - removed static declaration of mvpp2_root in mvpp2_dbgfs_init() > > I'm not seeing any other changes. > > Note that Sasha has submitted a revert of Greg's original patch for > mainline, so my original patch should apply as-is if that revert > happens - and I don't see any compile issues with it. > Sure. In order to avoid misunderstandings, I stop here and leave the submission to you. Thanks, Marcin
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c index 0eec05d905eb..16c303048f25 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c @@ -691,23 +691,47 @@ static int mvpp2_dbgfs_port_init(struct dentry *parent, return 0; } +static struct mvpp2_debug_dir { + struct dentry *dir; + struct kref kref; +} *mvpp2_root; + +static void mvpp2_release(struct kref *kref) +{ + struct mvpp2_debug_dir *mvpp2_root = container_of(kref, struct mvpp2_debug_dir, kref); + + debugfs_remove(mvpp2_root->dir); + kfree(mvpp2_root); +} + void mvpp2_dbgfs_cleanup(struct mvpp2 *priv) { debugfs_remove_recursive(priv->dbgfs_dir); kfree(priv->dbgfs_entries); + + if (mvpp2_root) + kref_put(&mvpp2_root->kref, mvpp2_release); } void mvpp2_dbgfs_init(struct mvpp2 *priv, const char *name) { - static struct dentry *mvpp2_root; struct dentry *mvpp2_dir; int ret, i; - if (!mvpp2_root) - mvpp2_root = debugfs_create_dir(MVPP2_DRIVER_NAME, NULL); + if (!mvpp2_root) { + mvpp2_root = kzalloc(sizeof(mvpp2_root), GFP_KERNEL); + if (!mvpp2_root) { + mvpp2_root = NULL; + return; + } + mvpp2_root->dir = debugfs_create_dir(MVPP2_DRIVER_NAME, NULL); + kref_init(&mvpp2_root->kref); + } else { + kref_get(&mvpp2_root->kref); + } - mvpp2_dir = debugfs_create_dir(name, mvpp2_root); + mvpp2_dir = debugfs_create_dir(name, mvpp2_root->dir); priv->dbgfs_dir = mvpp2_dir; priv->dbgfs_entries = kzalloc(sizeof(*priv->dbgfs_entries), GFP_KERNEL);
In commit fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"), if the module is unloaded, the directory will still be present if the module is loaded again and creating the directory will fail, causing the creation of additional child debugfs entries for the individual devices to fail. As this module never cleaned up the root directory it created, even when loaded, and unloading/loading a module is not a normal operation, none of would normally happen. To clean all of this up, use a tiny reference counted structure to hold the root debugfs directory for the driver, and then clean it up when the last user of it is removed from the system. This should resolve the previously reported problems, and the original memory leak that fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()") was attempting to fix. Reported-by: Russell King <linux@armlinux.org.uk> Cc: Marcin Wojtas <mw@semihalf.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Fixes: fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Note, test-built only, I do not have access to this hardware so please review it for any foolish mistakes I might have again made. .../ethernet/marvell/mvpp2/mvpp2_debugfs.c | 32 ++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-)