Message ID | CANk1AXQZ6ax5j9QOrLZA9qT2xGpiU0BQLxAmKQFEqSb6NrbeRQ@mail.gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Tue, Jun 12, 2018 at 1:16 PM, Alan Tull <atull@kernel.org> wrote: > On Sun, Mar 4, 2018 at 6:14 PM, <frowand.list@gmail.com> wrote: > > Hi Frank, > > I'm investigating a refcount use-after-free warning that happens after > overlays are applied, removed, reapplied a few (typically three) times > (see below). This is new in v4.17, didn't happen in v4.16. As I was > investigating I found that rebuilding the phandle_cache after overlays > are applied or removed seems to help. I was probably wrong about this. The more I look at the phandle_cache code, the more it looks looks good and straightforward. Probably disabling phandle_cache is 'fixing' things through some weird side effect. I'll keep investigating. Sorry for the noise. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-fpga" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/13/18 07:42, Alan Tull wrote: > On Tue, Jun 12, 2018 at 1:16 PM, Alan Tull <atull@kernel.org> wrote: >> On Sun, Mar 4, 2018 at 6:14 PM, <frowand.list@gmail.com> wrote: >> >> Hi Frank, >> >> I'm investigating a refcount use-after-free warning that happens after >> overlays are applied, removed, reapplied a few (typically three) times >> (see below). This is new in v4.17, didn't happen in v4.16. As I was >> investigating I found that rebuilding the phandle_cache after overlays >> are applied or removed seems to help. > > I was probably wrong about this. The more I look at the phandle_cache code, > the more it looks looks good and straightforward. Probably disabling > phandle_cache is 'fixing' things through some weird side effect. I'll > keep investigating. Sorry for the noise. I suspect that you have found an issue, even if it is not the cause of the refcount issue. I noted in a reply to v4 of the patch: >> +static void of_populate_phandle_cache(void) >> +{ >> +Â Â Â unsigned long flags; >> +Â Â Â u32 cache_entries; >> +Â Â Â struct device_node *np; >> +Â Â Â u32 phandles = 0; >> + >> +Â Â Â raw_spin_lock_irqsave(&devtree_lock, flags); >> + >> +Â Â Â kfree(phandle_cache); > > I couldn't understood this. Everything else looks good to me. I will be adding a call to of_populate_phandle_cache() from the devicetree overlay code. I put the kfree here so that the previous cache memory is freed when a new cache is created. Adding the call from the overlay code is not done in this series because I have a patch series modifying overlays and I do not want to create a conflict or ordering between that series and that patch. The lack of the call from overlay code means that overlay code will gain some of the overhead reduction from this patch, but possibly not the entire reduction. Sorry I'm not giving a link to the archive of this message - I have a class I have to go to so I don't have enough time to find it. The email was Subject: Re: [PATCH v3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle() Date: Fri, 16 Feb 2018 14:20:22 -0800 Message-ID: <46d5fc76-33e3-d54a-26b8-e9bb8332924d@gmail.com> Quickly looking at the current code, I don't see the overlay patch that I mentioned. I have to dig into what happened to that. Leaving a phandle from an overlay in the phandle cache after the overlay is removed would clearly be a bug. -Frank -- To unsubscribe from this list: send the line "unsubscribe linux-fpga" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 13, 2018 at 4:47 PM, Frank Rowand <frowand.list@gmail.com> wrote: > On 06/13/18 07:42, Alan Tull wrote: >> On Tue, Jun 12, 2018 at 1:16 PM, Alan Tull <atull@kernel.org> wrote: >>> On Sun, Mar 4, 2018 at 6:14 PM, <frowand.list@gmail.com> wrote: >>> >>> Hi Frank, >>> >>> I'm investigating a refcount use-after-free warning that happens after >>> overlays are applied, removed, reapplied a few (typically three) times >>> (see below). This is new in v4.17, didn't happen in v4.16. As I was >>> investigating I found that rebuilding the phandle_cache after overlays >>> are applied or removed seems to help. >> >> I was probably wrong about this. The more I look at the phandle_cache code, >> the more it looks looks good and straightforward. Probably disabling >> phandle_cache is 'fixing' things through some weird side effect. I'll >> keep investigating. Sorry for the noise. > > I suspect that you have found an issue, even if it is not the cause of > the refcount issue. I noted in a reply to v4 of the patch: > > >> +static void of_populate_phandle_cache(void) > >> +{ > >> +Â Â Â unsigned long flags; > >> +Â Â Â u32 cache_entries; > >> +Â Â Â struct device_node *np; > >> +Â Â Â u32 phandles = 0; > >> + > >> +Â Â Â raw_spin_lock_irqsave(&devtree_lock, flags); > >> + > >> +Â Â Â kfree(phandle_cache); > > > > I couldn't understood this. Everything else looks good to me. > > I will be adding a call to of_populate_phandle_cache() from the > devicetree overlay code. I put the kfree here so that the previous > cache memory is freed when a new cache is created. > > Adding the call from the overlay code is not done in this > series because I have a patch series modifying overlays and > I do not want to create a conflict or ordering between that > series and that patch. The lack of the call from overlay > code means that overlay code will gain some of the overhead > reduction from this patch, but possibly not the entire reduction. > > Sorry I'm not giving a link to the archive of this message - I have > a class I have to go to so I don't have enough time to find it. I understand, that's cool. I've found the email chain, thanks for pointing me to it! > The > email was > > Subject: Re: [PATCH v3] of: cache phandle nodes to reduce cost of > of_find_node_by_phandle() > Date: Fri, 16 Feb 2018 14:20:22 -0800 > Message-ID: <46d5fc76-33e3-d54a-26b8-e9bb8332924d@gmail.com> > > Quickly looking at the current code, I don't see the overlay patch > that I mentioned. I have to dig into what happened to that. > > Leaving a phandle from an overlay in the phandle cache after the > overlay is removed would clearly be a bug. Yes that's totally it, when I get a match of a phandle and the np address is stale, then comes the kernel warning. I added more debug code to print out the results of looking up dynamic nodes only (nodes > the highest node added when the cache was originally built). Also I added debug code to also force the lookup code to go back and look up dynamic nodes the slow, uncached way so I could see if it got the correct np address. I'm unloading and reloading an overlay, so sometimes the new np (after the overlay was removed and reapplied) is at the same address, but sometimes it isn't and then I get the kernel warning. Just for entertainment value, here is my prints of the np from the cache and the correct np that' at a different address. The '88' is the phandle, after that is the np address, then the np=<full_name>. Note also the corruption in the cached node's full_name OF: of_find_node_by_phandle line 1108 found dynamic node in cache 88 ee518680 np=/soc/base_fpga_region/\300\252~\356ze_controller@0x100000450 OF: of_find_node_by_phandle line 1119 found the slow way: 88 eeaf2d00 np=/soc/base_fpga_region/freeze_controller@0x100000450 I'm just scratching my head and wondering: should we be zeroing out some things in memory after an overlay is removed? Then this bug would led to a pointer that would have thrown an oops. If it's useful for you, next week I can send a cleaned up version of my patch that rebuilds the cache after overlays are applied or removed. Alan > > -Frank -- To unsubscribe from this list: send the line "unsubscribe linux-fpga" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/of/base.c b/drivers/of/base.c index 848f549..4184273 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -102,7 +102,7 @@ static u32 phandle_cache_mask; * - the phandle lookup overhead reduction provided by the cache * will likely be less */ -static void of_populate_phandle_cache(void) +void of_populate_phandle_cache(void) { unsigned long flags; u32 cache_entries; @@ -133,6 +133,7 @@ static void of_populate_phandle_cache(void) out: raw_spin_unlock_irqrestore(&devtree_lock, flags); } +EXPORT_SYMBOL_GPL(of_populate_phandle_cache); #ifndef CONFIG_MODULES static int __init of_free_phandle_cache(void) diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 7baa53e..55caf42 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -885,6 +885,8 @@ int of_overlay_fdt_apply(const void *overlay_fdt, u32 overlay_fdt_size, goto out; } + of_populate_phandle_cache(); + return 0; @@ -1070,6 +1072,7 @@ int of_overlay_remove(int *ovcs_id) } free_overlay_changeset(ovcs); + of_populate_phandle_cache(); out_unlock: mutex_unlock(&of_mutex); diff --git a/include/linux/of.h b/include/linux/of.h index 4d25e4f..a662d4e 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1342,6 +1342,9 @@ static inline int of_reconfig_get_state_change(unsigned long action, } #endif /* CONFIG_OF_DYNAMIC */ +//todo locate this more correctly, just testing for now +void of_populate_phandle_cache(void); + /** * of_device_is_system_power_controller - Tells if