Message ID | 20241008221657.1130181-7-terry.bowman@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
On Tue, 8 Oct 2024 17:16:48 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > The CXL AER service will be updated to support CXL PCIe port error > handling in the future. These devices will use a system panic during > recovery handling. Recovery handling by panic? :) That's an interesting form of recovery.. > > Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > --- > include/linux/pci.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 4cf89a4b4cbc..6f7e7371161d 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -857,6 +857,9 @@ enum pci_ers_result { > > /* No AER capabilities registered for the driver */ > PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, > + > + /* Device state requires system panic */ > + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, > }; > > /* PCI bus error event callbacks */
On 10/16/24 11:30, Jonathan Cameron wrote: > On Tue, 8 Oct 2024 17:16:48 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The CXL AER service will be updated to support CXL PCIe port error >> handling in the future. These devices will use a system panic during >> recovery handling. > > Recovery handling by panic? :) That's an interesting form of recovery.. > Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order to limit the blast radius of corruption in the case of UCE. The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well. Regards, Terry >> >> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> --- >> include/linux/pci.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 4cf89a4b4cbc..6f7e7371161d 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -857,6 +857,9 @@ enum pci_ers_result { >> >> /* No AER capabilities registered for the driver */ >> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, >> + >> + /* Device state requires system panic */ >> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, >> }; >> >> /* PCI bus error event callbacks */ >
On Wed, 16 Oct 2024 12:31:35 -0500 Terry Bowman <Terry.Bowman@amd.com> wrote: > On 10/16/24 11:30, Jonathan Cameron wrote: > > On Tue, 8 Oct 2024 17:16:48 -0500 > > Terry Bowman <terry.bowman@amd.com> wrote: > > > >> The CXL AER service will be updated to support CXL PCIe port error > >> handling in the future. These devices will use a system panic during > >> recovery handling. > > > > Recovery handling by panic? :) That's an interesting form of recovery.. > > > > Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order > to limit the blast radius of corruption in the case of UCE. That's fair enough. Maybe it should be called attempted recovery handling ;) This is fine. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Jonathan > > The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well. > > Regards, > Terry > > >> > >> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> --- > >> include/linux/pci.h | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index 4cf89a4b4cbc..6f7e7371161d 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -857,6 +857,9 @@ enum pci_ers_result { > >> > >> /* No AER capabilities registered for the driver */ > >> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, > >> + > >> + /* Device state requires system panic */ > >> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, > >> }; > >> > >> /* PCI bus error event callbacks */ > >
Hi Jonathan, On 10/17/2024 8:31 AM, Jonathan Cameron wrote: > On Wed, 16 Oct 2024 12:31:35 -0500 > Terry Bowman <Terry.Bowman@amd.com> wrote: > >> On 10/16/24 11:30, Jonathan Cameron wrote: >>> On Tue, 8 Oct 2024 17:16:48 -0500 >>> Terry Bowman <terry.bowman@amd.com> wrote: >>> >>>> The CXL AER service will be updated to support CXL PCIe port error >>>> handling in the future. These devices will use a system panic during >>>> recovery handling. >>> >>> Recovery handling by panic? :) That's an interesting form of recovery.. >>> >> >> Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order >> to limit the blast radius of corruption in the case of UCE. > That's fair enough. Maybe it should be called attempted recovery handling ;) > > This is fine. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Jonathan > I'll add "attempted" recovery to the commit message. Regards, Terry >> >> The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well. >> >> Regards, >> Terry >> >>>> >>>> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type. >>>> >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>>> --- >>>> include/linux/pci.h | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>> index 4cf89a4b4cbc..6f7e7371161d 100644 >>>> --- a/include/linux/pci.h >>>> +++ b/include/linux/pci.h >>>> @@ -857,6 +857,9 @@ enum pci_ers_result { >>>> >>>> /* No AER capabilities registered for the driver */ >>>> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, >>>> + >>>> + /* Device state requires system panic */ >>>> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, >>>> }; >>>> >>>> /* PCI bus error event callbacks */ >>> >
diff --git a/include/linux/pci.h b/include/linux/pci.h index 4cf89a4b4cbc..6f7e7371161d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -857,6 +857,9 @@ enum pci_ers_result { /* No AER capabilities registered for the driver */ PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, + + /* Device state requires system panic */ + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, }; /* PCI bus error event callbacks */
The CXL AER service will be updated to support CXL PCIe port error handling in the future. These devices will use a system panic during recovery handling. Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type. Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- include/linux/pci.h | 3 +++ 1 file changed, 3 insertions(+)