Message ID | CAFBinCDjysBNJLQYsvzBU7U2p7gv0Lxa+qe8f5YMn0BgUw6P0g@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Sun, May 14, 2017 at 11:20 PM, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > Hello, > > it seems that I am seeing some strange memory corruption on one of my > Amlogic Meson MX (32-bit) devices. > disclaimer: I have some patches in my tree which are not mainlined yet > (see [0]), but cannot see that any of these patches would cause memory > corruption of a clk_core instance. > > Oleg (who is CC'ed) has first reported this when testing my kernel tree: [1] > in the meantime I have rebased all of my patches to Linus' mainline > tree, commit 0fcc3ab23d7395f58e8ab0834e7913e2e4314a83 [3] > > what I am seeing is a NULL deref in clk_disable_unused_subtree, full > log attached and can be found here: [3] > an explanation of what seems to be going on in my own words is: > - in line #5 of the log the internal PWM mux clock for the first PWM > channel is being registered (everything looks good with > clk_core=0xeddfbf80 and clk_hw=0xeddfbf30) > - the default parent of this mux is "xtal" > - in line #31 of the log the "disable unused clocks" cleanup starts > and checks the first child of the "xtal" clock and seems to find > clk_core=0xeddfbf80 *BUT* clk_hw=0x00000003 > - this doesn't seem right and a crash is pretty obvious > > I also attached the patch which introduces this additional logspam - > just in case anyone wants to know what these values mean exactly. > > now the interesting part: > I can reproduce this with multi_v7_defconfig and > arch/arm/boot/dts/meson8m2-m8s.dts from my tree. > if I leave everything as it is and *only* enable CONFIG_DEBUG_SPINLOCK > then this crash goes away. so this *might* be a race-condition > somewhere... a user named "wilson2000" (since I missed you on IRC: thank you!) pointed out on IRC that there's a memory corruption bug in v4.11 and early v4.12 kernels which is fixed by [0] "perf/core: Avoid removing shared pmu_context on unregister" I have not tested this yet but this looks suspicious (so the common clock framework may be innocent). I will report back once I had time to test this. > has anybody seen this crash before? I can help debugging/testing > potential fixes/trying out various things to solve this - just let me > know! > > > Regards, > Martin > > > [0] https://github.com/xdarklight/linux/tree/meson-mx-integration-4.12-20170513 > [1] http://lists.infradead.org/pipermail/linux-amlogic/2017-May/003497.html > [2] https://github.com/torvalds/linux/commit/0fcc3ab23d7395f58e8ab0834e7913e2e4314a83 > [3] https://paste.kde.org/pbefvmqgr [0] https://cgit.freedesktop.org/drm/drm-intel/commit/?h=drm-intel-nightly&id=73ac44749e71333bce7d2f8c0bbdc1bbc57dae1b -- To unsubscribe from this list: send the line "unsubscribe linux-clk" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 18, 2017 at 8:36 PM, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > On Sun, May 14, 2017 at 11:20 PM, Martin Blumenstingl > <martin.blumenstingl@googlemail.com> wrote: >> Hello, >> >> it seems that I am seeing some strange memory corruption on one of my >> Amlogic Meson MX (32-bit) devices. >> disclaimer: I have some patches in my tree which are not mainlined yet >> (see [0]), but cannot see that any of these patches would cause memory >> corruption of a clk_core instance. >> >> Oleg (who is CC'ed) has first reported this when testing my kernel tree: [1] >> in the meantime I have rebased all of my patches to Linus' mainline >> tree, commit 0fcc3ab23d7395f58e8ab0834e7913e2e4314a83 [3] >> >> what I am seeing is a NULL deref in clk_disable_unused_subtree, full >> log attached and can be found here: [3] >> an explanation of what seems to be going on in my own words is: >> - in line #5 of the log the internal PWM mux clock for the first PWM >> channel is being registered (everything looks good with >> clk_core=0xeddfbf80 and clk_hw=0xeddfbf30) >> - the default parent of this mux is "xtal" >> - in line #31 of the log the "disable unused clocks" cleanup starts >> and checks the first child of the "xtal" clock and seems to find >> clk_core=0xeddfbf80 *BUT* clk_hw=0x00000003 >> - this doesn't seem right and a crash is pretty obvious >> >> I also attached the patch which introduces this additional logspam - >> just in case anyone wants to know what these values mean exactly. >> >> now the interesting part: >> I can reproduce this with multi_v7_defconfig and >> arch/arm/boot/dts/meson8m2-m8s.dts from my tree. >> if I leave everything as it is and *only* enable CONFIG_DEBUG_SPINLOCK >> then this crash goes away. so this *might* be a race-condition >> somewhere... > a user named "wilson2000" (since I missed you on IRC: thank you!) > pointed out on IRC that there's a memory corruption bug in v4.11 and > early v4.12 kernels which is fixed by [0] "perf/core: Avoid removing > shared pmu_context on unregister" > I have not tested this yet but this looks suspicious (so the common > clock framework may be innocent). I will report back once I had time > to test this. I applied that patch and re-tested this: unfortunately it still crashes with the same symptoms so I am still interested in any kind of hint >> has anybody seen this crash before? I can help debugging/testing >> potential fixes/trying out various things to solve this - just let me >> know! >> >> >> Regards, >> Martin >> >> >> [0] https://github.com/xdarklight/linux/tree/meson-mx-integration-4.12-20170513 >> [1] http://lists.infradead.org/pipermail/linux-amlogic/2017-May/003497.html >> [2] https://github.com/torvalds/linux/commit/0fcc3ab23d7395f58e8ab0834e7913e2e4314a83 >> [3] https://paste.kde.org/pbefvmqgr > > > [0] https://cgit.freedesktop.org/drm/drm-intel/commit/?h=drm-intel-nightly&id=73ac44749e71333bce7d2f8c0bbdc1bbc57dae1b -- To unsubscribe from this list: send the line "unsubscribe linux-clk" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index fc58c52a26b4..1942fe2c28b0 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -759,7 +759,7 @@ static void clk_disable_unused_subtree(struct clk_core *core) { struct clk_core *child; unsigned long flags; - +printk("%s(0x%08x/0x%08x/%s) - empty=%d\n", __func__, core, core->hw, core->name, hlist_empty(&core->children)); lockdep_assert_held(&prepare_lock); hlist_for_each_entry(child, &core->children, child_node) @@ -1795,7 +1795,7 @@ static int clk_core_set_parent(struct clk_core *core, struct clk_core *parent) if (!core) return 0; - +printk("%s(0x%08x/0x%08x/%s, %s)\n", __func__, core, core->hw, core->name, parent ? parent->name : "NULL"); /* prevent racing with updates to the clock topology */ clk_prepare_lock(); @@ -2422,6 +2422,7 @@ static int __clk_core_init(struct clk_core *core) hlist_add_head(&core->child_node, &core->parent->children); core->orphan = core->parent->orphan; + printk("%s(0x%08x/0x%08x/%s) -> parent = 0x%08x/0x%08x/%s\n", __func__, core, core->hw, core->name, core->parent, core->parent->hw, core->parent->name); } else if (!core->num_parents) { hlist_add_head(&core->child_node, &clk_root_list); core->orphan = false; @@ -2621,17 +2622,18 @@ struct clk *clk_register(struct device *dev, struct clk_hw *hw) }; INIT_HLIST_HEAD(&core->clks); + INIT_HLIST_HEAD(&core->children); hw->clk = __clk_create_clk(hw, NULL, NULL); if (IS_ERR(hw->clk)) { ret = PTR_ERR(hw->clk); goto fail_parents; } - +printk("%s(0x%08x/0x%08x/%s)\n", __func__, core, core->hw, core->name); ret = __clk_core_init(core); if (!ret) return hw->clk; - +printk("...failed!\n"); __clk_free_clk(hw->clk); hw->clk = NULL; @@ -2671,7 +2673,7 @@ static void __clk_release(struct kref *ref) { struct clk_core *core = container_of(ref, struct clk_core, ref); int i = core->num_parents; - +printk("%s(0x%08x/%s)\n", __func__, core->hw, core->name); lockdep_assert_held(&prepare_lock); kfree(core->parents); @@ -2728,7 +2730,7 @@ void clk_unregister(struct clk *clk) if (!clk || WARN_ON_ONCE(IS_ERR(clk))) return; - +printk("%s(%s)\n", __func__, clk->core->name); clk_debug_unregister(clk->core); clk_prepare_lock(); diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c index 4bf0b543ad50..c7238782557e 100644 --- a/drivers/pwm/pwm-meson.c +++ b/drivers/pwm/pwm-meson.c @@ -121,12 +121,15 @@ static inline struct meson_pwm *to_meson_pwm(struct pwm_chip *chip) static int meson_pwm_request(struct pwm_chip *chip, struct pwm_device *pwm) { struct meson_pwm_channel *channel = pwm_get_chip_data(pwm); + struct meson_pwm *meson = to_meson_pwm(chip); + struct meson_pwm_channel *chan0 = pwm_get_chip_data(&meson->chip.pwms[0]); struct device *dev = chip->dev; int err; if (!channel) return -ENODEV; - +printk("%s - channel->mux.hw = 0x%08x\n", __func__, &channel->mux.hw); +printk("%s - channel0->mux.hw = 0x%08x\n", __func__, &chan0->mux.hw); if (channel->clk_parent) { err = clk_set_parent(channel->clk, channel->clk_parent); if (err < 0) { @@ -434,6 +437,8 @@ static int meson_pwm_init_channels(struct meson_pwm *meson, err = PTR_ERR(channel->clk); dev_err(dev, "failed to register %s: %d\n", name, err); return err; + } else { + dev_info(dev, "channel->mux.hw = 0x%08x\n", &channel->mux.hw); } snprintf(name, sizeof(name), "clkin%u", i); @@ -456,8 +461,10 @@ static void meson_pwm_add_channels(struct meson_pwm *meson, { unsigned int i; - for (i = 0; i < meson->chip.npwm; i++) + for (i = 0; i < meson->chip.npwm; i++) { pwm_set_chip_data(&meson->chip.pwms[i], &channels[i]); + printk("%s(%d) = 0x%08x\n", __func__, i, &channels[i].mux.hw); + } } static int meson_pwm_probe(struct platform_device *pdev)