mbox series

[0/2] iommu/dmar: expose fault counters via sysfs

Message ID 20191015151112.17225-1-volchkov@amazon.de (mailing list archive)
Headers show
Series iommu/dmar: expose fault counters via sysfs | expand

Message

Yuri Volchkov Oct. 15, 2019, 3:11 p.m. UTC
For health monitoring, it can be useful to know if iommu is behaving as
expected. DMAR faults can be an indicator that a device:
 - has been misconfigured, or
 - has experienced a hardware hiccup and replacement should
   be considered, or
 - has been issuing faults due to malicious activity

Currently the only way to check if there were any DMAR faults on the
host is to scan the dmesg output. However this approach is not very
elegant. The information we are looking for can be wrapped out of the
buffer, or masked (since it is a rate-limited print) by another
device.

The series adds counters for DMAR faults and exposes them via sysfs.

Yuri Volchkov (2):
  iommu/dmar: collect fault statistics
  iommu/dmar: catch early fault occurrences

 drivers/iommu/dmar.c        | 182 ++++++++++++++++++++++++++++++++----
 drivers/iommu/intel-iommu.c |   1 +
 drivers/pci/pci-sysfs.c     |  20 ++++
 include/linux/intel-iommu.h |   4 +
 include/linux/pci.h         |  11 +++
 5 files changed, 201 insertions(+), 17 deletions(-)

Comments

Baolu Lu Oct. 16, 2019, 12:45 a.m. UTC | #1
Hi,

On 10/15/19 11:11 PM, Yuri Volchkov wrote:
> For health monitoring, it can be useful to know if iommu is behaving as
> expected. DMAR faults can be an indicator that a device:
>   - has been misconfigured, or
>   - has experienced a hardware hiccup and replacement should
>     be considered, or
>   - has been issuing faults due to malicious activity
> 
> Currently the only way to check if there were any DMAR faults on the
> host is to scan the dmesg output. However this approach is not very
> elegant. The information we are looking for can be wrapped out of the
> buffer, or masked (since it is a rate-limited print) by another
> device.
> 
> The series adds counters for DMAR faults and exposes them via sysfs.
> 

We now have an iommu API named iommu_register_fault_handler() to
register callbacks for dmar faults. How about monitoring the dmar
fault through this api so that your code could be generic and vendor
agnostic?

Best regards,
Baolu

> Yuri Volchkov (2):
>    iommu/dmar: collect fault statistics
>    iommu/dmar: catch early fault occurrences
> 
>   drivers/iommu/dmar.c        | 182 ++++++++++++++++++++++++++++++++----
>   drivers/iommu/intel-iommu.c |   1 +
>   drivers/pci/pci-sysfs.c     |  20 ++++
>   include/linux/intel-iommu.h |   4 +
>   include/linux/pci.h         |  11 +++
>   5 files changed, 201 insertions(+), 17 deletions(-)
>