Message ID | 20240401025620.205068-4-frank@crawford.emu.id.au (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | hwmon (it87): Correct handling for configuration mode | expand |
On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote: > Major part of the change for the new method to avoid chipset issues. > > Perform an intial test if the chip ID can be read without entering > configuration mode, if so, do not enter configuration mode and continue > as is. > > If chip ID cannot be read, enter configuration mode, and continue with > previous code. > > Also update exit code to take account of if we entered configuration > mode or not. > You describe the changes, but you don't describe the problem you are trying to solve. Even if configuration mode was already entered, that is not a reason to keep it active. We don't _know_ what is going on outside the driver and can not make assumptions. For changes like this you really need to explain the problem you are trying to solve, and the reasoning behind the changes. Just assuming that a set of chips would have its SIO mode enabled by the BIOS is just wrong. We don't know what if anything the BIOS is doing. Guenter
On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote: > On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote: > > Major part of the change for the new method to avoid chipset > > issues. > > > > Perform an intial test if the chip ID can be read without entering > > configuration mode, if so, do not enter configuration mode and > > continue > > as is. > > > > If chip ID cannot be read, enter configuration mode, and continue > > with > > previous code. > > > > Also update exit code to take account of if we entered > > configuration > > mode or not. > > > > You describe the changes, but you don't describe the problem you are > trying to solve. Even if configuration mode was already entered, that > is not a reason to keep it active. We don't _know_ what is going on > outside the driver and can not make assumptions. For changes like > this > you really need to explain the problem you are trying to solve, and > the > reasoning behind the changes. Just assuming that a set of chips would > have its SIO mode enabled by the BIOS is just wrong. We don't know > what if anything the BIOS is doing. In this regard, it is no different to the current code, which runs SIO mode enable, but does not ever run the SIO mode disable code. In fact this code is actually safer than the previous code in that it acts similar to the chip not being enabled or disabled, which would happen if no driver existed. > > Guenter > Regards Frank
On Thu, Apr 11, 2024 at 09:08:19PM +1000, Frank Crawford wrote: > On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote: > > On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote: > > > Major part of the change for the new method to avoid chipset > > > issues. > > > > > > Perform an intial test if the chip ID can be read without entering > > > configuration mode, if so, do not enter configuration mode and > > > continue > > > as is. > > > > > > If chip ID cannot be read, enter configuration mode, and continue > > > with > > > previous code. > > > > > > Also update exit code to take account of if we entered > > > configuration > > > mode or not. > > > > > > > You describe the changes, but you don't describe the problem you are > > trying to solve. Even if configuration mode was already entered, that > > is not a reason to keep it active. We don't _know_ what is going on > > outside the driver and can not make assumptions. For changes like > > this > > you really need to explain the problem you are trying to solve, and > > the > > reasoning behind the changes. Just assuming that a set of chips would > > have its SIO mode enabled by the BIOS is just wrong. We don't know > > what if anything the BIOS is doing. > > In this regard, it is no different to the current code, which runs SIO > mode enable, but does not ever run the SIO mode disable code. > > In fact this code is actually safer than the previous code in that it > acts similar to the chip not being enabled or disabled, which would > happen if no driver existed. Not really. There is also the watchdog code which will happily disable SIO access after it is done, causing subsequent accesses by the hwmon driver to fail. The code also assumes that SIO access was not erroneously left enabled by some other code which we don't have any control over. You assume that the hwmon driver is the only driver accessing the chip. That is a wrong assumption. I understand that the underlying problem is really that there is no SIO access infrastructure in the kernel. In the absence of such an infrastructure we can not make any assumptions about SIO access control implemented by other drivers in the kernel, and specifically can not assume that SIO access won't be disabled by other drivers just because it was enabled when the hwmon driver probe function was running. Guenter
On Thu, 2024-04-11 at 05:20 -0700, Guenter Roeck wrote: > On Thu, Apr 11, 2024 at 09:08:19PM +1000, Frank Crawford wrote: > > On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote: > > > On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote: > > > > Major part of the change for the new method to avoid chipset > > > > issues. > > > > > > > > Perform an intial test if the chip ID can be read without > > > > entering > > > > configuration mode, if so, do not enter configuration mode and > > > > continue > > > > as is. > > > > > > > > If chip ID cannot be read, enter configuration mode, and > > > > continue > > > > with > > > > previous code. > > > > > > > > Also update exit code to take account of if we entered > > > > configuration > > > > mode or not. > > > > > > > > > > You describe the changes, but you don't describe the problem you > > > are > > > trying to solve. Even if configuration mode was already entered, > > > that > > > is not a reason to keep it active. We don't _know_ what is going > > > on > > > outside the driver and can not make assumptions. For changes like > > > this > > > you really need to explain the problem you are trying to solve, > > > and > > > the > > > reasoning behind the changes. Just assuming that a set of chips > > > would > > > have its SIO mode enabled by the BIOS is just wrong. We don't > > > know > > > what if anything the BIOS is doing. > > > > In this regard, it is no different to the current code, which runs > > SIO > > mode enable, but does not ever run the SIO mode disable code. > > > > In fact this code is actually safer than the previous code in that > > it > > acts similar to the chip not being enabled or disabled, which would > > happen if no driver existed. > > Not really. There is also the watchdog code which will happily > disable > SIO access after it is done, causing subsequent accesses by the hwmon > driver to fail. The code also assumes that SIO access was not > erroneously > left enabled by some other code which we don't have any control over. And unfortunately if I can't do anything about it, I can only ignore it. If something does come up we can see what can work out at the time. > > You assume that the hwmon driver is the only driver accessing the > chip. > That is a wrong assumption. I understand that the underlying problem > is really that there is no SIO access infrastructure in the kernel. > In the absence of such an infrastructure we can not make any > assumptions > about SIO access control implemented by other drivers in the kernel, > and specifically can not assume that SIO access won't be disabled by > other drivers just because it was enabled when the hwmon driver probe > function was running. In this case the fact that it is the second chip may mean it will not come up. While I am told that the chip is fully functional with non- hwmon functions, but currently it does look like most of those aren't used. While this won't necessarily stay this way in the future, we currently cannot do anything about it. > > Guenter Regards Frank
On Fri, Apr 12, 2024 at 08:11:11PM +1000, Frank Crawford wrote: > > > > Not really. There is also the watchdog code which will happily > > disable > > SIO access after it is done, causing subsequent accesses by the hwmon > > driver to fail. The code also assumes that SIO access was not > > erroneously > > left enabled by some other code which we don't have any control over. > > And unfortunately if I can't do anything about it, I can only ignore > it. If something does come up we can see what can work out at the > time. > > > > You assume that the hwmon driver is the only driver accessing the > > chip. > > That is a wrong assumption. I understand that the underlying problem > > is really that there is no SIO access infrastructure in the kernel. > > In the absence of such an infrastructure we can not make any > > assumptions > > about SIO access control implemented by other drivers in the kernel, > > and specifically can not assume that SIO access won't be disabled by > > other drivers just because it was enabled when the hwmon driver probe > > function was running. > > In this case the fact that it is the second chip may mean it will not > come up. While I am told that the chip is fully functional with non- > hwmon functions, but currently it does look like most of those aren't > used. While this won't necessarily stay this way in the future, we > currently cannot do anything about it. This patch affects all chips, not just the second one. If any chip is in configuration mode when instantiating this driver, configuration mode won't be enabled anymore, no matter what other drivers may or may not do. That includes situations where other drivers (or the BIOS) erroneously do not disable configuration mode. I understand your reasoning about not enabling configuration mode for certain chips, but that does not explain why it would be necessary to do this for all chips all the time. Sure, there is something we can do: Unless there is a known problem that affects _all_ chips, drop this patch. Thanks, Guenter
On Fri, 2024-04-12 at 05:48 -0700, Guenter Roeck wrote: > On Fri, Apr 12, 2024 at 08:11:11PM +1000, Frank Crawford wrote: > > > > > > Not really. There is also the watchdog code which will happily > > > disable > > > SIO access after it is done, causing subsequent accesses by the > > > hwmon > > > driver to fail. The code also assumes that SIO access was not > > > erroneously > > > left enabled by some other code which we don't have any control > > > over. > > > > And unfortunately if I can't do anything about it, I can only > > ignore > > it. If something does come up we can see what can work out at the > > time. > > > > > > You assume that the hwmon driver is the only driver accessing the > > > chip. > > > That is a wrong assumption. I understand that the underlying > > > problem > > > is really that there is no SIO access infrastructure in the > > > kernel. > > > In the absence of such an infrastructure we can not make any > > > assumptions > > > about SIO access control implemented by other drivers in the > > > kernel, > > > and specifically can not assume that SIO access won't be disabled > > > by > > > other drivers just because it was enabled when the hwmon driver > > > probe > > > function was running. > > > > In this case the fact that it is the second chip may mean it will > > not > > come up. While I am told that the chip is fully functional with > > non- > > hwmon functions, but currently it does look like most of those > > aren't > > used. While this won't necessarily stay this way in the future, we > > currently cannot do anything about it. > > This patch affects all chips, not just the second one. If any chip is > in configuration mode when instantiating this driver, configuration > mode > won't be enabled anymore, no matter what other drivers may or may not > do. > That includes situations where other drivers (or the BIOS) > erroneously > do not disable configuration mode. > > I understand your reasoning about not enabling configuration mode for > certain chips, but that does not explain why it would be necessary > to do this for all chips all the time. > > Sure, there is something we can do: Unless there is a known problem > that affects _all_ chips, drop this patch. Ahh, if that is the impression you have from my description, then I will need to improve it, as that is not the case. The actual update does the following: 1) Lock the memory, but does not perform a SIO entry (previously it would have performed an SIO entry). 2) Attempt to read the chipID. This should be safe no matter which chip we have. 3) If step (2) fails, then perform SIO entry and retry chipID read. If it fails, act similarly to prior to this patch. 4) Set the sio_data->type, similar to previously. 5) If we have not performed an SIO entry, and this is not a chip type with the NOCONF feature, then it will perform an SIO entry at this point. 6) Proceed setup as prior to this patch. 7) Any following access to the SIO registers will invoke the SIO entry and SIO exit steps unless it is a chip with the NOCONF feature set. This was set up in the previous patches in this patchset. 8) There is also some minor update to the failure exit based on if it had performed a SIO entry or not, in addition to the previous tests. > > Thanks, > Guenter Regards Frank
diff --git a/drivers/hwmon/it87.c b/drivers/hwmon/it87.c index 396c2d3afbf7..6a77f2f6e1e1 100644 --- a/drivers/hwmon/it87.c +++ b/drivers/hwmon/it87.c @@ -2667,6 +2667,27 @@ static const struct attribute_group it87_group_auto_pwm = { .is_visible = it87_auto_pwm_is_visible, }; +/* + * Original explanation: + * On various Gigabyte AM4 boards (AB350, AX370), the second Super-IO chip + * (IT8792E) needs to be in configuration mode before accessing the first + * due to a bug in IT8792E which otherwise results in LPC bus access errors. + * This needs to be done before accessing the first Super-IO chip since + * the second chip may have been accessed prior to loading this driver. + * + * The problem is also reported to affect IT8795E, which is used on X299 boards + * and has the same chip ID as IT8792E (0x8733). It also appears to affect + * systems with IT8790E, which is used on some Z97X-Gaming boards as well as + * Z87X-OC. + * + * From other information supplied: + * ChipIDs 0x8733, 0x8695 (early ID for IT87952E) and 0x8790 are intialised + * by the BIOS and left in configuration mode, and entering and/or exiting + * configuration mode is what causes the crash. + * + * The recommendation is to look up the chipID before doing any mode swap + * and then act accordingly. + */ /* SuperIO detection - will change isa_address if a chip is found */ static int __init it87_find(int sioaddr, unsigned short *address, struct it87_sio_data *sio_data, int chip_cnt) @@ -2674,16 +2695,25 @@ static int __init it87_find(int sioaddr, unsigned short *address, int err; u16 chip_type; const struct it87_devices *config = NULL; + bool opened = false; - err = superio_enter(sioaddr, false); + /* First step, lock memory but don't enter configuration mode */ + err = superio_enter(sioaddr, true); if (err) return err; err = -ENODEV; chip_type = superio_inw(sioaddr, DEVID); - /* check first for a valid chip before forcing chip id */ - if (chip_type == 0xffff) - goto exit; + /* Check for a valid chip before forcing chip id */ + if (chip_type == 0xffff) { + /* Enter configuration mode */ + __superio_enter(sioaddr); + opened = true; + /* and then try again */ + chip_type = superio_inw(sioaddr, DEVID); + if (chip_type == 0xffff) + goto exit; + } if (force_id_cnt == 1) { /* If only one value given use for all chips */ @@ -2767,6 +2797,18 @@ static int __init it87_find(int sioaddr, unsigned short *address, config = &it87_devices[sio_data->type]; + /* + * If previously we didn't enter configuration mode and it isn't a + * chip we know is initialised by the BIOS, then enter configuration + * mode. + * + * I don't know if any such chips can exist but be defensive. + */ + if (!opened && !has_conf_biosopen(config)) { + __superio_enter(sioaddr); + opened = true; + } + superio_select(sioaddr, PME); if (!(superio_inb(sioaddr, IT87_ACT_REG) & 0x01)) { pr_info("Device (chip %s ioreg 0x%x) not activated, skipping\n", @@ -3144,7 +3186,7 @@ static int __init it87_find(int sioaddr, unsigned short *address, } exit: - superio_exit(sioaddr, config ? has_conf_biosopen(config) : false); + superio_exit(sioaddr, opened && config && has_conf_biosopen(config)); return err; }
Major part of the change for the new method to avoid chipset issues. Perform an intial test if the chip ID can be read without entering configuration mode, if so, do not enter configuration mode and continue as is. If chip ID cannot be read, enter configuration mode, and continue with previous code. Also update exit code to take account of if we entered configuration mode or not. Signed-off-by: Frank Crawford <frank@crawford.emu.id.au> --- drivers/hwmon/it87.c | 52 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 47 insertions(+), 5 deletions(-)