Message ID | 20201118145009.10492-1-nm@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | regulator: ti-abb: Fix array out of bound read access on the first transition | expand |
Hi, On Wed, 18 Nov 2020 08:50:09 -0600 Nishanth Menon <nm@ti.com> wrote: > At the start of driver initialization, we do not know what bias > setting the bootloader has configured the system for and we only know > for certain the very first time we do a transition. > > However, since the initial value of the comparison index is -EINVAL, > this negative value results in an array out of bound access on the > very first transition. > > Since we don't know what the setting is, we just set the bias > configuration as there is nothing to compare against. This prevents > the array out of bound access. > > NOTE: Even though we could use a more relaxed check of "< 0" the only > valid values(ignoring cosmic ray induced bitflips) are -EINVAL, 0+. > > Fixes: 40b1936efebd ("regulator: Introduce TI Adaptive Body Bias(ABB) on-chip LDO driver") > Link: https://lore.kernel.org/linux-mm/CA+G9fYuk4imvhyCN7D7T6PMDH6oNp6HDCRiTUKMQ6QXXjBa4ag@mail.gmail.com/ > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> > Reviewed-by: Arnd Bergmann <arnd@arndb.de> > Signed-off-by: Nishanth Menon <nm@ti.com> > --- > > Mark, > > I will leave it to your descretion if this needs to be tagged for > stable or to drop the Fixes tag - Side effect, if this occurs, will be > an unstable system very hard to track down - but typically occurring > during system boot - Impacts systems: DM3/OMAP3,4,5,DRA7/AM5x. > > I would categorize this as "This could be a problem..." problem.. > the bug is an out of bound read, and has been around since v3.11 and is > not a catastrophic data corruption kind of issue. > > Though theoretically, there is a possibility that the compare may > pass and result in missing bias configuration(and resulting system > will be unstable), I have'nt heard of actual report (but, it will be > surprising if any actual instability was actually tracked down to this > bug). Any ways, I had to go to git full history to pick the exact > commit - I have left it in the patch. > > Hmm so probably these boot problems which only occur when your debug cable is not attached? Is there any connection with commits like this: ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo So would the potential problem be more probable by patches like the that one mentioned above? Regards, Andreas
On 16:38-20201118, Andreas Kemnade wrote: [...] > > Though theoretically, there is a possibility that the compare may > > pass and result in missing bias configuration(and resulting system > > will be unstable), I have'nt heard of actual report (but, it will be > > surprising if any actual instability was actually tracked down to this > > bug). Any ways, I had to go to git full history to pick the exact > > commit - I have left it in the patch. > > > > > Hmm so probably these boot problems which only occur when your debug > cable is not attached? If you mean a timing problem introduced by connecting jtag as in a timing behavior? short answer: NO. Long winded answer: To understand the bug, we need to understand what is being programmed, why is there a comparison and then we understand why this is a bug. What is bring programmed: In certain domains, Transistor in the older TI SoCs in OMAP generation were doped and had specific voltage planes to allow for very fine grained tuning of transistor switching frequency to be a factor of vdd (domain voltage) AND a bias voltage. So, if you attempt to switch the transistors at a specific frequency with either just the voltage or bias correctly configured, depending on the age of the transistor and temperature factors, transistors may start misbehaving. This is basically like a 101 of transistor physics. Voltage is the other factor that comes into play - Adaptive voltage control is yet another scheme on TI (and a lot more other SoCs now - different names, but fundamentally the same thing..) to balance the correct level so that the transistors dont go and end up in a thermal runaway scenario. (but not the context of this discussion) What is the check in the code? This is basically a check to prevent reconfiguring the ABB (Body bias) hardware if the configuration that is present on the hardware is the same as before. this may be the same index, OR could be that multiple indices have the same programming configuration. It is the second part (multiple indices have the same programming configuration) that is of interest for this bug. Why: The ABB hardware blocks' Bias programming involves a set of steps that can take too long OR can put the state of the hardware into intermediate state which may be unstable as the transition of clock frequencies OR voltage takes place (this is the transients / short term intermediate configuration "glitches" that we want to avoid). Now, the problem happens when you compare index[-EINVAL] basically array negative offset against a valid array, you are comparing content, so it is not a timing related problem. if it so happens that the content compared was same (very very very unlikely that after all linkage etc.. -EINVAL offset will exactly matchup .. but lets say you use address randomizer of some sort.. then there is a chance that some unrelated address content may happen to have the same content as comparison point, yes..).. anyways, if the compare matches, the logic wrongly assumes.. hey, this was already a configured ABB bias and ok, lets skip programming it.. Now you end up with freq and voltage correct and ABB wrongly configured for the duration.. situation does'nt exactly recover itself till a mismatches content index is compared. So, no, unless we are using debug cable to artifically create the mismatch, you should'nt see a difference of debug vs no debug cable for this bug at least. > > Is there any connection with commits like this: > ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo Side note: Help lookup with commit ID.. 341afbc9ea39 ("ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo") > > So would the potential problem be more probable by patches like the > that one mentioned above? Short answer: no. Long winded answer: That is actually fixing the problem where one part of the problem (enabling ABB is being done for higher OPPs) - we will also need AVS class 1.5 for a complete solution. Honestly, I have'nt had time to pay closer attention to what was going on there, but note that bias along with the correct voltage is the functional state for the transistor the correct switching frequency. Remember: these are configurations that TI puts on chip assuming all worst case operational conditions AND adding a margin on top.. Very very few devices in the broad spectrum of usage ever come and operate in that "bad space". So, we may have some level of fall outs in probably corner of the bell curve in the part distributions in the process.. Chances are very high that the typical device most folks have are "nominal" and not in the tail end of the process distribution, so, most of the users can probably survive having a "non-optimal" configuration. So the combination probability: you are operating in a corner process lot in the worst possible condition + your invalid array index pointed memory content exactly matches the content of valid index memory content, hence skipping a bias programming... Ummm.. If I was a betting man, I'd probably bet against it.. But, hey, be a bit conservative at times, who knows..I think there are few of these devices that have flown off to space, right?
On Wed, 18 Nov 2020 08:50:09 -0600, Nishanth Menon wrote: > At the start of driver initialization, we do not know what bias > setting the bootloader has configured the system for and we only know > for certain the very first time we do a transition. > > However, since the initial value of the comparison index is -EINVAL, > this negative value results in an array out of bound access on the > very first transition. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/1] regulator: ti-abb: Fix array out of bound read access on the first transition commit: 2ba546ebe0ce2af47833d8912ced9b4a579f13cb All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
diff --git a/drivers/regulator/ti-abb-regulator.c b/drivers/regulator/ti-abb-regulator.c index 3e60bff76194..9f0a4d50cead 100644 --- a/drivers/regulator/ti-abb-regulator.c +++ b/drivers/regulator/ti-abb-regulator.c @@ -342,8 +342,17 @@ static int ti_abb_set_voltage_sel(struct regulator_dev *rdev, unsigned sel) return ret; } - /* If data is exactly the same, then just update index, no change */ info = &abb->info[sel]; + /* + * When Linux kernel is starting up, we are'nt sure of the + * Bias configuration that bootloader has configured. + * So, we get to know the actual setting the first time + * we are asked to transition. + */ + if (abb->current_info_idx == -EINVAL) + goto just_set_abb; + + /* If data is exactly the same, then just update index, no change */ oinfo = &abb->info[abb->current_info_idx]; if (!memcmp(info, oinfo, sizeof(*info))) { dev_dbg(dev, "%s: Same data new idx=%d, old idx=%d\n", __func__, @@ -351,6 +360,7 @@ static int ti_abb_set_voltage_sel(struct regulator_dev *rdev, unsigned sel) goto out; } +just_set_abb: ret = ti_abb_set_opp(rdev, abb, info); out: