Message ID | E1q4kX6-00BNuM-Mx@rmk-PC.armlinux.org.uk (mailing list archive) |
---|---|
State | Accepted |
Commit | 4ec7329517027db28c5683675ab3b3842ad60324 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: phylib: fix phy_read*_poll_timeout() | expand |
On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote:
> + __ret = read_poll_timeout(__val = phy_read, val, \
^^^
Is this not __val on purpose?
On Thu, 1 Jun 2023 21:33:45 -0700 Jakub Kicinski wrote: > On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote: > > + __ret = read_poll_timeout(__val = phy_read, val, \ > ^^^ > Is this not __val on purpose? Yes it is :) All this to save the single line of assignment after the read_poll_timeout() "call" ?
On Thu, Jun 01, 2023 at 09:35:09PM -0700, Jakub Kicinski wrote: > On Thu, 1 Jun 2023 21:33:45 -0700 Jakub Kicinski wrote: > > On Thu, 01 Jun 2023 16:48:12 +0100 Russell King (Oracle) wrote: > > > + __ret = read_poll_timeout(__val = phy_read, val, \ > > ^^^ > > Is this not __val on purpose? > > Yes it is :) All this to save the single line of assignment > after the read_poll_timeout() "call" ? Okay, so it seems you don't like it. We can't fix it then, and we'll have to go with the BUILD_BUG_ON() forcing all users to use a signed varable (which better be larger than a s8 so negative errnos can fit) or we just rely on Dan to report the problems.
On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote: > > Yes it is :) All this to save the single line of assignment > > after the read_poll_timeout() "call" ? > > Okay, so it seems you don't like it. We can't fix it then, and we'll > have to go with the BUILD_BUG_ON() forcing all users to use a signed > varable (which better be larger than a s8 so negative errnos can fit) > or we just rely on Dan to report the problems. Wait, did the version I proposed not work? https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/ I just think the assignment inside the first argument is unnecessarily unreadable. Maybe it's just me.
On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote: > On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote: > > > Yes it is :) All this to save the single line of assignment > > > after the read_poll_timeout() "call" ? > > > > Okay, so it seems you don't like it. We can't fix it then, and we'll > > have to go with the BUILD_BUG_ON() forcing all users to use a signed > > varable (which better be larger than a s8 so negative errnos can fit) > > or we just rely on Dan to report the problems. > > Wait, did the version I proposed not work? > > https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/ If we're into the business of throwing web URLs at each other for messages we've already read, here's my one for you which contains the explanation why your one is broken, and proposing my solution. https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/ To see exactly why yours is broken, see the paragraph starting "The elephant in the room..." If it needs yet more explanation, which clearly it does, then let's look at what genphy_loopback is doing: ret = phy_read_poll_timeout(phydev, MII_BMSR, val, val & BMSR_LSTATUS, 5000, 500000, true); Now, with your supposed "fix" of: + int __ret, __val; \ + \ + __ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond), \ sleep_us, timeout_us, sleep_before_read, phydev, regnum); \ This ends up being: int __ret, __val; __ret = read_poll_timeout(phy_read, __val, __val < 0 || (val & BMSR_LSTATUS), sleep_us, timeout_us, sleep_before_read, phydev, regnum); and that expands to something that does this: __val = phy_read(phydev, regnum); if (__val < 0 || (val & BMSR_LSTATUS)) break; Can you spot the bug yet? Where does "val" for the test "val & BMSR_LSTATUS" come from? A bigger hint. With the existing code, this would have been: val = phy_read(phydev, regnum); if (val < 0 || (val & BMSR_LSTATUS)) break; See the difference? val & BMSR_LSTATUS is checking the value that was returned from phy_read() here, but in yours, it's checking an uninitialised variable. With my proposal, this becomes: val = __val = phy_read(phydev, regnum); if (__val < 0 || (val & BMSR_LSTATUS)) break; where "val" is whatever type the user chose, which has absolutely _no_ bearing what so ever on whether the test for __val < 0 can be correctly evaluated, and makes that test totally independent of whatever type the user chose.
On Fri, Jun 02, 2023 at 05:17:59PM +0100, Russell King (Oracle) wrote: > On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote: > > On Fri, 2 Jun 2023 09:53:09 +0100 Russell King (Oracle) wrote: > > > > Yes it is :) All this to save the single line of assignment > > > > after the read_poll_timeout() "call" ? > > > > > > Okay, so it seems you don't like it. We can't fix it then, and we'll > > > have to go with the BUILD_BUG_ON() forcing all users to use a signed > > > varable (which better be larger than a s8 so negative errnos can fit) > > > or we just rely on Dan to report the problems. > > > > Wait, did the version I proposed not work? > > > > https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/ > > If we're into the business of throwing web URLs at each other for > messages we've already read, here's my one for you which contains > the explanation why your one is broken, and proposing my solution. > > https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/ > > To see exactly why yours is broken, see the paragraph starting > "The elephant in the room..." > > If it needs yet more explanation, which clearly it does, then let's > look at what genphy_loopback is doing: > > ret = phy_read_poll_timeout(phydev, MII_BMSR, val, > val & BMSR_LSTATUS, > 5000, 500000, true); > > Now, with your supposed "fix" of: > > + int __ret, __val; \ > + \ > + __ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond), \ > sleep_us, timeout_us, sleep_before_read, phydev, regnum); \ > > This ends up being: > > int __ret, __val; > > __ret = read_poll_timeout(phy_read, __val, __val < 0 || (val & BMSR_LSTATUS), > sleep_us, timeout_us, sleep_before_read, phydev, regnum); > > and that expands to something that does this: > > __val = phy_read(phydev, regnum); > if (__val < 0 || (val & BMSR_LSTATUS)) > break; > > Can you spot the bug yet? Where does "val" for the test "val & BMSR_LSTATUS" > come from? > > A bigger hint. With the existing code, this would have been: > > val = phy_read(phydev, regnum); > if (val < 0 || (val & BMSR_LSTATUS)) > break; > > See the difference? val & BMSR_LSTATUS is checking the value that was > returned from phy_read() here, but in yours, it's checking an > uninitialised variable. > > With my proposal, this becomes: > > val = __val = phy_read(phydev, regnum); > if (__val < 0 || (val & BMSR_LSTATUS)) > break; > > where "val" is whatever type the user chose, which has absolutely _no_ > bearing what so ever on whether the test for __val < 0 can be correctly > evaluated, and makes that test totally independent of whatever type the > user chose. If you don't like my solution, then I suppose another possibility would be: #define __phy_poll_read(phydev, regnum, val) \ ({ \ int __err; \ __err = phy_read(phydev, regnum); \ if (__err >= 0) \ val = __err; \ __err; \ }) #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \ timeout_us, sleep_before_read) \ ({ \ int __ret, __err; \ __ret = read_poll_timeout(__phy_poll_read, __err, \ __err < 0 || (cond), \ sleep_us, timeout_us, sleep_before_read, phydev, regnum, val); \ if (__err < 0) \ __ret = __err; \ ... but that brings with it the possibility of using an uninitialised "val" (e.g. if phy_read() returns an error on the first iteration.) and is way more horrid and even less easy to understand. Remember that we default to *not* warning about uninitialised variables when building the kernel, so this won't produce a warning - which I guess is probably why you didn't notice that your suggestion left "val" uninitialised.
On Fri, 2 Jun 2023 17:34:31 +0100 Russell King (Oracle) wrote: > On Fri, Jun 02, 2023 at 05:17:59PM +0100, Russell King (Oracle) wrote: > > On Fri, Jun 02, 2023 at 09:05:39AM -0700, Jakub Kicinski wrote: > > > Wait, did the version I proposed not work? > > > > > > https://lore.kernel.org/all/20230530121910.05b9f837@kernel.org/ > > > > If we're into the business of throwing web URLs at each other for > > messages we've already read, here's my one for you which contains > > the explanation why your one is broken, and proposing my solution. > > > > https://lore.kernel.org/all/ZHZmBBDSVMf1WQWI@shell.armlinux.org.uk/ > > > > To see exactly why yours is broken, see the paragraph starting > > "The elephant in the room..." Ah, yes, sorry, I'll admit I didn't get what you mean by the elephant paragraph when I read that. > If you don't like my solution, then I suppose another possibility would > be: > > #define __phy_poll_read(phydev, regnum, val) \ > ({ \ > int __err; \ > __err = phy_read(phydev, regnum); \ > if (__err >= 0) \ > val = __err; \ > __err; \ > }) > > #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \ > timeout_us, sleep_before_read) \ > ({ \ > int __ret, __err; \ > __ret = read_poll_timeout(__phy_poll_read, __err, \ > __err < 0 || (cond), \ > sleep_us, timeout_us, sleep_before_read, phydev, regnum, val); \ > if (__err < 0) \ > __ret = __err; \ > ... > > but that brings with it the possibility of using an uninitialised > "val" (e.g. if phy_read() returns an error on the first iteration.) > and is way more horrid and even less easy to understand. > > Remember that we default to *not* warning about uninitialised variables > when building the kernel, so this won't produce a warning - which I > guess is probably why you didn't notice that your suggestion left "val" > uninitialised. Right :( Let's keep the patch as is.
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 01 Jun 2023 16:48:12 +0100 you wrote: > Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew > reports that: > > "It is common to get this wrong in general with PHY drivers. Dan > regularly posts fixes like this soon after a PHY driver patch it > merged. I really wish we could somehow get the compiler to warn when > the result from phy_read() is stored into a unsigned type. It would > save Dan a lot of work." > > [...] Here is the summary with links: - [net-next] net: phylib: fix phy_read*_poll_timeout() https://git.kernel.org/netdev/net-next/c/4ec732951702 You are awesome, thank you!
diff --git a/include/linux/phy.h b/include/linux/phy.h index 7addde5d14c0..11c1e91563d4 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1206,10 +1206,12 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum) #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \ timeout_us, sleep_before_read) \ ({ \ - int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \ + int __ret, __val; \ + __ret = read_poll_timeout(__val = phy_read, val, \ + __val < 0 || (cond), \ sleep_us, timeout_us, sleep_before_read, phydev, regnum); \ - if (val < 0) \ - __ret = val; \ + if (__val < 0) \ + __ret = __val; \ if (__ret) \ phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \ __ret; \ @@ -1302,11 +1304,13 @@ int phy_read_mmd(struct phy_device *phydev, int devad, u32 regnum); #define phy_read_mmd_poll_timeout(phydev, devaddr, regnum, val, cond, \ sleep_us, timeout_us, sleep_before_read) \ ({ \ - int __ret = read_poll_timeout(phy_read_mmd, val, (cond) || val < 0, \ + int __ret, __val; \ + __ret = read_poll_timeout(__val = phy_read_mmd, val, \ + __val < 0 || (cond), \ sleep_us, timeout_us, sleep_before_read, \ phydev, devaddr, regnum); \ - if (val < 0) \ - __ret = val; \ + if (__val < 0) \ + __ret = __val; \ if (__ret) \ phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \ __ret; \
Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew reports that: "It is common to get this wrong in general with PHY drivers. Dan regularly posts fixes like this soon after a PHY driver patch it merged. I really wish we could somehow get the compiler to warn when the result from phy_read() is stored into a unsigned type. It would save Dan a lot of work." Let's make phy_read*_poll_timeout() immune to further issues when "val" is an unsigned type by storing the read function's result in a signed int as well as "val", and using the signed variable both to check for an error and for propagating that error to the caller. The advantage of this method is we don't change where the cast from the signed return code to the user's variable occurs - so users will see no change. Previously Heiner changed phy_read_poll_timeout() to check for an error before evaluating the user supplied condition, but didn't update phy_read_mmd_poll_timeout(). Make that change there too. Link: https://lore.kernel.org/r/d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- include/linux/phy.h | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)