diff mbox

acpi: apei: fix the wrongly parse generic error status block

Message ID 1502388412-20273-1-git-send-email-gengdongjiu@huawei.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Dongjiu Geng Aug. 10, 2017, 6:06 p.m. UTC
The revision 0x300 generic error data entry is different with the old
version. when ghes_do_proc traverses to get the data entry, it does not
consider this difference. so when error status block has revision 0x300
data entry, it will have issue.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 drivers/acpi/apei/apei-internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Tyler Baicar Aug. 10, 2017, 5:48 p.m. UTC | #1
Hello,

I have already posted a patch fixing this. Please see:

https://lkml.org/lkml/2017/8/3/824

This makes the loop identical to the CPER code which prints the estatus 
blocks to the kernel logs.

Thanks,

Tyler


On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
> The revision 0x300 generic error data entry is different with the old
> version. when ghes_do_proc traverses to get the data entry, it does not
> consider this difference. so when error status block has revision 0x300
> data entry, it will have issue.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
>   drivers/acpi/apei/apei-internal.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
> index 6e9f14c0a71b..6491f1c4a96e 100644
> --- a/drivers/acpi/apei/apei-internal.h
> +++ b/drivers/acpi/apei/apei-internal.h
> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>   
>   #define apei_estatus_for_each_section(estatus, section)			\
>   	for (section = (struct acpi_hest_generic_data *)(estatus + 1);	\
> -	     (void *)section - (void *)estatus < estatus->data_length;	\
> -	     section = (void *)(section+1) + section->error_data_length)
> +	     (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> +	     section = acpi_hest_get_next(section))
>   
>   static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>   {
Dongjiu Geng Aug. 10, 2017, 9:31 p.m. UTC | #2
Hello,

   sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
can be used by other place to avoid duplicated code, such as prints the estatus blocks.

On 2017/8/11 1:48, Baicar, Tyler wrote:
> Hello,
> 
> I have already posted a patch fixing this. Please see:
> 
> https://lkml.org/lkml/2017/8/3/824
> 
> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
> 
> Thanks,
> 
> Tyler
> 
> 
> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different with the old
>> version. when ghes_do_proc traverses to get the data entry, it does not
>> consider this difference. so when error status block has revision 0x300
>> data entry, it will have issue.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> ---
>>   drivers/acpi/apei/apei-internal.h | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>> index 6e9f14c0a71b..6491f1c4a96e 100644
>> --- a/drivers/acpi/apei/apei-internal.h
>> +++ b/drivers/acpi/apei/apei-internal.h
>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>     #define apei_estatus_for_each_section(estatus, section)            \
>>       for (section = (struct acpi_hest_generic_data *)(estatus + 1);    \
>> -         (void *)section - (void *)estatus < estatus->data_length;    \
>> -         section = (void *)(section+1) + section->error_data_length)
>> +         (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>> +         section = acpi_hest_get_next(section))
>>     static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>>   {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongjiu Geng Aug. 10, 2017, 9:37 p.m. UTC | #3
may be directly remove the macro apei_estatus_for_each_section is not better, if other place code also
needs to iterate through the GHES estatus blocks, it will be repeated written again.


On 2017/8/11 5:31, gengdongjiu wrote:
> Hello,
> 
>    sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
> can be used by other place to avoid duplicated code, such as prints the estatus blocks.
> 
> On 2017/8/11 1:48, Baicar, Tyler wrote:
>> Hello,
>>
>> I have already posted a patch fixing this. Please see:
>>
>> https://lkml.org/lkml/2017/8/3/824
>>
>> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
>>
>> Thanks,
>>
>> Tyler
>>
>>
>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>> The revision 0x300 generic error data entry is different with the old
>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>> consider this difference. so when error status block has revision 0x300
>>> data entry, it will have issue.
>>>
>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>> ---
>>>   drivers/acpi/apei/apei-internal.h | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>> --- a/drivers/acpi/apei/apei-internal.h
>>> +++ b/drivers/acpi/apei/apei-internal.h
>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>>     #define apei_estatus_for_each_section(estatus, section)            \
>>>       for (section = (struct acpi_hest_generic_data *)(estatus + 1);    \
>>> -         (void *)section - (void *)estatus < estatus->data_length;    \
>>> -         section = (void *)(section+1) + section->error_data_length)
>>> +         (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>> +         section = acpi_hest_get_next(section))
>>>     static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>>>   {
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tyler Baicar Aug. 11, 2017, 1:19 p.m. UTC | #4
I removed the apei_estatus_for_each_section because it was only being 
used in this one spot even though several other parts of the code do the 
same iteration (it is done several times in the CPER code). I made this 
iteration match the CPER iterations because the CPER iterations are 
verifying that the structures are all valid and lengths are correct. If 
those checks are being done this way, it makes the most sense to mimic 
that iteration here when calling into EDAC and triggering trace events.

Thanks,

Tyler


On 8/10/2017 3:37 PM, gengdongjiu wrote:
> may be directly remove the macro apei_estatus_for_each_section is not better, if other place code also
> needs to iterate through the GHES estatus blocks, it will be repeated written again.
>
>
> On 2017/8/11 5:31, gengdongjiu wrote:
>> Hello,
>>
>>     sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
>> can be used by other place to avoid duplicated code, such as prints the estatus blocks.
>>
>> On 2017/8/11 1:48, Baicar, Tyler wrote:
>>> Hello,
>>>
>>> I have already posted a patch fixing this. Please see:
>>>
>>> https://lkml.org/lkml/2017/8/3/824
>>>
>>> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
>>>
>>> Thanks,
>>>
>>> Tyler
>>>
>>>
>>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>>> The revision 0x300 generic error data entry is different with the old
>>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>>> consider this difference. so when error status block has revision 0x300
>>>> data entry, it will have issue.
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>> ---
>>>>    drivers/acpi/apei/apei-internal.h | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>>> --- a/drivers/acpi/apei/apei-internal.h
>>>> +++ b/drivers/acpi/apei/apei-internal.h
>>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>>>      #define apei_estatus_for_each_section(estatus, section)            \
>>>>        for (section = (struct acpi_hest_generic_data *)(estatus + 1);    \
>>>> -         (void *)section - (void *)estatus < estatus->data_length;    \
>>>> -         section = (void *)(section+1) + section->error_data_length)
>>>> +         (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>>> +         section = acpi_hest_get_next(section))
>>>>      static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>>>>    {
gengdongjiu Aug. 11, 2017, 11:41 p.m. UTC | #5
2017-08-11 21:19 GMT+08:00 Baicar, Tyler <tbaicar@codeaurora.org>:
> I removed the apei_estatus_for_each_section because it was only being used
> in this one spot even though several other parts of the code do the same
> iteration (it is done several times in the CPER code). I made this iteration
> match the CPER iterations because the CPER iterations are verifying that the
> structures are all valid and lengths are correct. If those checks are being
> done this way, it makes the most sense to mimic that iteration here when
> calling into EDAC and triggering trace events.

I think the macro includes the verification for the structures and
lengths correction, it it does not correct, it will breadk the loop.
I do not see your modifcation does some special validation, it almost
smilar with the macro does.
in all the code there are three functions to do the iteration.
ghes_do_proc/cper_estatus_print/cper_estatus_check
the cper_estatus_check function is especial, because its  purpose is
to validate CPER, so it will  check every length. but not all
function's purpose is to check, for example cper_estatus_print, as
shown below, so we can use this macro to clear the code. Now we can
see there are two function use it, but in the future, if want to
iterate CPER, we can use the macro if it does not want to do special
thing.

#define apei_estatus_for_each_section(estatus, section) \
for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
(void *)section - (void *)(estatus + 1) < estatus->data_length; \
------> here it will check whether length is valid, if not,  it will
break the loop
section = acpi_hest_get_next(section))


Original code:
void cper_estatus_print(const char *pfx,
   const struct acpi_hest_generic_status *estatus)
{
 struct acpi_hest_generic_data *gdata;
 unsigned int data_len;
 int sec_no = 0;
 char newpfx[64];
 __u16 severity;
 severity = estatus->error_severity;
 if (severity == CPER_SEV_CORRECTED)
  printk("%s%s\n", pfx,
         "It has been corrected by h/w "
         "and requires no further action");
 printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 data_len = estatus->data_length;
 gdata = (struct acpi_hest_generic_data *)(estatus + 1);
 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 while (data_len >= acpi_hest_get_size(gdata)) {
  cper_estatus_print_section(newpfx, gdata, sec_no);
  data_len -= acpi_hest_get_record_size(gdata);
  gdata = acpi_hest_get_next(gdata);
  sec_no++;
 }
}

Can change to:
void cper_estatus_print(const char *pfx,
   const struct acpi_hest_generic_status *estatus)
{
 struct acpi_hest_generic_data *gdata;
 int sec_no = 0;
 char newpfx[64];
 __u16 severity;
 severity = estatus->error_severity;
 if (severity == CPER_SEV_CORRECTED)
  printk("%s%s\n", pfx,
         "It has been corrected by h/w "
         "and requires no further action");
 printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 apei_estatus_for_each_section {
  cper_estatus_print_section(newpfx, gdata, sec_no);
  sec_no++;
 }
}



>
> Thanks,
>
> Tyler
>
>
> On 8/10/2017 3:37 PM, gengdongjiu wrote:
>>
>> may be directly remove the macro apei_estatus_for_each_section is not
>> better, if other place code also
>> needs to iterate through the GHES estatus blocks, it will be repeated
>> written again.
>>
>>
>> On 2017/8/11 5:31, gengdongjiu wrote:
>>>
>>> Hello,
>>>
>>>     sorry, I do not see that. Just know I have reviewed your
>>> modification, may be my change can be simpleness and reserve the macro of
>>> apei_estatus_for_each_section
>>> can be used by other place to avoid duplicated code, such as prints the
>>> estatus blocks.
>>>
>>> On 2017/8/11 1:48, Baicar, Tyler wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have already posted a patch fixing this. Please see:
>>>>
>>>> https://lkml.org/lkml/2017/8/3/824
>>>>
>>>> This makes the loop identical to the CPER code which prints the estatus
>>>> blocks to the kernel logs.
>>>>
>>>> Thanks,
>>>>
>>>> Tyler
>>>>
>>>>
>>>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>>>>
>>>>> The revision 0x300 generic error data entry is different with the old
>>>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>>>> consider this difference. so when error status block has revision 0x300
>>>>> data entry, it will have issue.
>>>>>
>>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>> ---
>>>>>    drivers/acpi/apei/apei-internal.h | 4 ++--
>>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/acpi/apei/apei-internal.h
>>>>> b/drivers/acpi/apei/apei-internal.h
>>>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>>>> --- a/drivers/acpi/apei/apei-internal.h
>>>>> +++ b/drivers/acpi/apei/apei-internal.h
>>>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>>>>      #define apei_estatus_for_each_section(estatus, section)
>>>>> \
>>>>>        for (section = (struct acpi_hest_generic_data *)(estatus + 1);
>>>>> \
>>>>> -         (void *)section - (void *)estatus < estatus->data_length;
>>>>> \
>>>>> -         section = (void *)(section+1) + section->error_data_length)
>>>>> +         (void *)section - (void *)(estatus + 1) <
>>>>> estatus->data_length; \
>>>>> +         section = acpi_hest_get_next(section))
>>>>>      static inline u32 cper_estatus_len(struct acpi_hest_generic_status
>>>>> *estatus)
>>>>>    {
>
>
> --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tyler Baicar Aug. 14, 2017, 2:04 p.m. UTC | #6
On 8/11/2017 5:41 PM, gengdongjiu wrote:
> 2017-08-11 21:19 GMT+08:00 Baicar, Tyler <tbaicar@codeaurora.org>:
>> I removed the apei_estatus_for_each_section because it was only being used
>> in this one spot even though several other parts of the code do the same
>> iteration (it is done several times in the CPER code). I made this iteration
>> match the CPER iterations because the CPER iterations are verifying that the
>> structures are all valid and lengths are correct. If those checks are being
>> done this way, it makes the most sense to mimic that iteration here when
>> calling into EDAC and triggering trace events.
> I think the macro includes the verification for the structures and
> lengths correction, it it does not correct, it will breadk the loop.
> I do not see your modifcation does some special validation, it almost
> smilar with the macro does.
> in all the code there are three functions to do the iteration.
> ghes_do_proc/cper_estatus_print/cper_estatus_check
> the cper_estatus_check function is especial, because its  purpose is
> to validate CPER, so it will  check every length. but not all
> function's purpose is to check, for example cper_estatus_print, as
> shown below, so we can use this macro to clear the code. Now we can
> see there are two function use it, but in the future, if want to
> iterate CPER, we can use the macro if it does not want to do special
> thing.
>
> #define apei_estatus_for_each_section(estatus, section) \
> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
> (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> ------> here it will check whether length is valid, if not,  it will
> break the loop
> section = acpi_hest_get_next(section))
>
>
> Original code:
> void cper_estatus_print(const char *pfx,
>     const struct acpi_hest_generic_status *estatus)
> {
>   struct acpi_hest_generic_data *gdata;
>   unsigned int data_len;
>   int sec_no = 0;
>   char newpfx[64];
>   __u16 severity;
>   severity = estatus->error_severity;
>   if (severity == CPER_SEV_CORRECTED)
>    printk("%s%s\n", pfx,
>           "It has been corrected by h/w "
>           "and requires no further action");
>   printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
>   data_len = estatus->data_length;
>   gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>   while (data_len >= acpi_hest_get_size(gdata)) {
>    cper_estatus_print_section(newpfx, gdata, sec_no);
>    data_len -= acpi_hest_get_record_size(gdata);
>    gdata = acpi_hest_get_next(gdata);
>    sec_no++;
>   }
> }
>
> Can change to:
> void cper_estatus_print(const char *pfx,
>     const struct acpi_hest_generic_status *estatus)
> {
>   struct acpi_hest_generic_data *gdata;
>   int sec_no = 0;
>   char newpfx[64];
>   __u16 severity;
>   severity = estatus->error_severity;
>   if (severity == CPER_SEV_CORRECTED)
>    printk("%s%s\n", pfx,
>           "It has been corrected by h/w "
>           "and requires no further action");
>   printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>   apei_estatus_for_each_section {
>    cper_estatus_print_section(newpfx, gdata, sec_no);
>    sec_no++;
>   }
> }
This change works too, I think it just makes sense to have the 
iterations in the CPER and GHES code match. Do you want to add a patch 
to your patch here to change the CPER code as well? If so, I'll wait for 
that and test it out.

Thanks,
Tyler

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongjiu Geng Aug. 15, 2017, 1:14 a.m. UTC | #7
Hi,Tyler,
   Yes, I will add a patch based on it, thanks a lot that you will also have a test.


On 2017/8/14 22:04, Baicar, Tyler wrote:
> This change works too, I think it just makes sense to have the iterations in the CPER and GHES code match. Do you want to add a patch to your patch here to change the CPER code as well? If so, I'll wait for that and test it out.
> 
> Thanks,
> Tyler

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
index 6e9f14c0a71b..6491f1c4a96e 100644
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -122,8 +122,8 @@  struct dentry *apei_get_debugfs_dir(void);
 
 #define apei_estatus_for_each_section(estatus, section)			\
 	for (section = (struct acpi_hest_generic_data *)(estatus + 1);	\
-	     (void *)section - (void *)estatus < estatus->data_length;	\
-	     section = (void *)(section+1) + section->error_data_length)
+	     (void *)section - (void *)(estatus + 1) < estatus->data_length; \
+	     section = acpi_hest_get_next(section))
 
 static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
 {