Message ID | 20230615160328.419610-4-yazen.ghannam@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | Enhance AMD SMN Error Checking | expand |
On 6/15/23 09:03, Yazen Ghannam wrote: > Check the return value of amd_smn_read() before saving a value. This > ensures invalid values aren't saved or used. > > There are three cases here with slightly different behavior. > > 1) read_tempreg_nb_zen(): > This is a function pointer which does not include a return code. > In this case, set the register value to 0 on failure. This > enforces Read-as-Zero behavior. > > 2) k10temp_read_temp(): > This function does have return codes, so return the error code > from the failed register read. Continued operation is not > necessary, since there is no valid data from the register. > Furthermore, if the register value was set to 0, then the > following operation would underflow. > > 3) k10temp_get_ccd_support(): > This function reads the same register from multiple CCD > instances in a loop. And a bitmask is formed if a specific bit > is set in each register instance. The loop should continue on a > failed register read, skipping the bit check. > > Furthermore, the __must_check attribute will be added to amd_smn_read(). > Therefore, this change is required to avoid compile-time warnings. > > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> > Cc: stable@vger.kernel.org Acked-by: Guenter Roeck <linux@roeck-us.net> > --- > Link: > https://lore.kernel.org/r/20230516202430.4157216-4-yazen.ghannam@amd.com > > v1->v2: > * Address comments from Guenter. > > drivers/hwmon/k10temp.c | 36 +++++++++++++++++++++++++++--------- > 1 file changed, 27 insertions(+), 9 deletions(-) > > diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c > index 7b177b9fbb09..70f7b77e6ece 100644 > --- a/drivers/hwmon/k10temp.c > +++ b/drivers/hwmon/k10temp.c > @@ -145,8 +145,9 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval) > > static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval) > { > - amd_smn_read(amd_pci_dev_to_node_id(pdev), > - ZEN_REPORTED_TEMP_CTRL_BASE, regval); > + if (amd_smn_read(amd_pci_dev_to_node_id(pdev), > + ZEN_REPORTED_TEMP_CTRL_BASE, regval)) > + *regval = 0; > } > > static long get_raw_temp(struct k10temp_data *data) > @@ -197,6 +198,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, > long *val) > { > struct k10temp_data *data = dev_get_drvdata(dev); > + int ret = -EOPNOTSUPP; > u32 regval; > > switch (attr) { > @@ -213,13 +215,17 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, > *val = 0; > break; > case 2 ... 13: /* Tccd{1-12} */ > - amd_smn_read(amd_pci_dev_to_node_id(data->pdev), > - ZEN_CCD_TEMP(data->ccd_offset, channel - 2), > - ®val); > + ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev), > + ZEN_CCD_TEMP(data->ccd_offset, channel - 2), > + ®val); > + > + if (ret) > + return ret; > + > *val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000; > break; > default: > - return -EOPNOTSUPP; > + return ret; > } > break; > case hwmon_temp_max: > @@ -235,7 +241,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, > - ((regval >> 24) & 0xf)) * 500 + 52000; > break; > default: > - return -EOPNOTSUPP; > + return ret; > } > return 0; > } > @@ -373,8 +379,20 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev, > int i; > > for (i = 0; i < limit; i++) { > - amd_smn_read(amd_pci_dev_to_node_id(pdev), > - ZEN_CCD_TEMP(data->ccd_offset, i), ®val); > + /* > + * Ignore inaccessible CCDs. > + * > + * Some systems will return a register value of 0, and the TEMP_VALID > + * bit check below will naturally fail. > + * > + * Other systems will return a PCI_ERROR_RESPONSE (0xFFFFFFFF) for > + * the register value. And this will incorrectly pass the TEMP_VALID > + * bit check. > + */ > + if (amd_smn_read(amd_pci_dev_to_node_id(pdev), > + ZEN_CCD_TEMP(data->ccd_offset, i), ®val)) > + continue; > + > if (regval & ZEN_CCD_TEMP_VALID) > data->show_temp |= BIT(TCCD_BIT(i)); > }
diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index 7b177b9fbb09..70f7b77e6ece 100644 --- a/drivers/hwmon/k10temp.c +++ b/drivers/hwmon/k10temp.c @@ -145,8 +145,9 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval) static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval) { - amd_smn_read(amd_pci_dev_to_node_id(pdev), - ZEN_REPORTED_TEMP_CTRL_BASE, regval); + if (amd_smn_read(amd_pci_dev_to_node_id(pdev), + ZEN_REPORTED_TEMP_CTRL_BASE, regval)) + *regval = 0; } static long get_raw_temp(struct k10temp_data *data) @@ -197,6 +198,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, long *val) { struct k10temp_data *data = dev_get_drvdata(dev); + int ret = -EOPNOTSUPP; u32 regval; switch (attr) { @@ -213,13 +215,17 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, *val = 0; break; case 2 ... 13: /* Tccd{1-12} */ - amd_smn_read(amd_pci_dev_to_node_id(data->pdev), - ZEN_CCD_TEMP(data->ccd_offset, channel - 2), - ®val); + ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev), + ZEN_CCD_TEMP(data->ccd_offset, channel - 2), + ®val); + + if (ret) + return ret; + *val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000; break; default: - return -EOPNOTSUPP; + return ret; } break; case hwmon_temp_max: @@ -235,7 +241,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel, - ((regval >> 24) & 0xf)) * 500 + 52000; break; default: - return -EOPNOTSUPP; + return ret; } return 0; } @@ -373,8 +379,20 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev, int i; for (i = 0; i < limit; i++) { - amd_smn_read(amd_pci_dev_to_node_id(pdev), - ZEN_CCD_TEMP(data->ccd_offset, i), ®val); + /* + * Ignore inaccessible CCDs. + * + * Some systems will return a register value of 0, and the TEMP_VALID + * bit check below will naturally fail. + * + * Other systems will return a PCI_ERROR_RESPONSE (0xFFFFFFFF) for + * the register value. And this will incorrectly pass the TEMP_VALID + * bit check. + */ + if (amd_smn_read(amd_pci_dev_to_node_id(pdev), + ZEN_CCD_TEMP(data->ccd_offset, i), ®val)) + continue; + if (regval & ZEN_CCD_TEMP_VALID) data->show_temp |= BIT(TCCD_BIT(i)); }
Check the return value of amd_smn_read() before saving a value. This ensures invalid values aren't saved or used. There are three cases here with slightly different behavior. 1) read_tempreg_nb_zen(): This is a function pointer which does not include a return code. In this case, set the register value to 0 on failure. This enforces Read-as-Zero behavior. 2) k10temp_read_temp(): This function does have return codes, so return the error code from the failed register read. Continued operation is not necessary, since there is no valid data from the register. Furthermore, if the register value was set to 0, then the following operation would underflow. 3) k10temp_get_ccd_support(): This function reads the same register from multiple CCD instances in a loop. And a bitmask is formed if a specific bit is set in each register instance. The loop should continue on a failed register read, skipping the bit check. Furthermore, the __must_check attribute will be added to amd_smn_read(). Therefore, this change is required to avoid compile-time warnings. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: stable@vger.kernel.org --- Link: https://lore.kernel.org/r/20230516202430.4157216-4-yazen.ghannam@amd.com v1->v2: * Address comments from Guenter. drivers/hwmon/k10temp.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-)