diff mbox series

[v3] fpga: dfl: fme: adding reserved bits for revision of FME/Port error

Message ID 20220527073253.306501-1-tianfei.zhang@intel.com (mailing list archive)
State New
Headers show
Series [v3] fpga: dfl: fme: adding reserved bits for revision of FME/Port error | expand

Commit Message

Zhang, Tianfei May 27, 2022, 7:32 a.m. UTC
From: Tianfei zhang <tianfei.zhang@intel.com>

There are 2 different register layouts for FME/Port error
registers. The older PAC card like N3000 used the older
register layout, but to improve the scalability, the new
production like Intel PAC N6000 plans to deploy the new
register format.

To distinguish the register layouts, we provide another sysfs
node for revision info, but it is a bad method that using one
sysfs node's value to determine the usage of other sysfs node.

This patch introduces 4 reserved bits (Bit[59:56]) in error
register to store the revision value which readout from
DFH_REVISION field in DFH register, the DFH_REVSION field also
4 bits width. DFL driver appends the FME/Port error
revision info in those bits for attribution on the readout
value.

Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
---
v3:
 - no code change, just add some explanation of two register layouts in git message.
v2:
 - add documentation for this change.
 - fix the reverse xmas tree declaration.
---
 .../ABI/testing/sysfs-platform-dfl-fme        | 37 ++++++++++++-------
 drivers/fpga/dfl-fme-error.c                  | 36 +++++++++++++++---
 2 files changed, 54 insertions(+), 19 deletions(-)

Comments

Wu, Hao May 30, 2022, 2:35 a.m. UTC | #1
> Subject: [PATCH v3] fpga: dfl: fme: adding reserved bits for revision of FME/Port
> error
> 
> From: Tianfei zhang <tianfei.zhang@intel.com>
> 
> There are 2 different register layouts for FME/Port error
> registers. The older PAC card like N3000 used the older
> register layout, but to improve the scalability, the new
> production like Intel PAC N6000 plans to deploy the new
> register format.
> 
> To distinguish the register layouts, we provide another sysfs
> node for revision info, but it is a bad method that using one
> sysfs node's value to determine the usage of other sysfs node.
> 
> This patch introduces 4 reserved bits (Bit[59:56]) in error
> register to store the revision value which readout from
> DFH_REVISION field in DFH register, the DFH_REVSION field also
> 4 bits width. DFL driver appends the FME/Port error
> revision info in those bits for attribution on the readout
> value.

My current feeling is that the essential requirement now, is
that userspace requires to know what the error types on the
current FPGA card, as FPGA logic can be customized by different
user. Ideally it should expose something like GUID to userspace,
but now there is no such design, if GUID design could be finalized
then we can just expose that info via GUID sysfs interface, if this
is not possible, then I feel even revision sysfs interface is better
than this kind of hack.

> 
> Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
> ---
> v3:
>  - no code change, just add some explanation of two register layouts in git
> message.
> v2:
>  - add documentation for this change.
>  - fix the reverse xmas tree declaration.
> ---
>  .../ABI/testing/sysfs-platform-dfl-fme        | 37 ++++++++++++-------
>  drivers/fpga/dfl-fme-error.c                  | 36 +++++++++++++++---
>  2 files changed, 54 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index d6ab34e81b9b..b886568d6071 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -50,46 +50,57 @@ Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
>  Description:	Read-Write. Read this file for errors detected on pcie0 link.
> -		Write this file to clear errors logged in pcie0_errors. Write
> -		fails with -EINVAL if input parsing fails or input error code
> -		doesn't match.
> +		The readout value has embedded 4 bits revision attribution
> +		in Bit[59:56] which reserved by hardware. Write this file to
> +		clear errors logged in pcie0_errors. Clean Bit[59:56] before
> +		write this file. Write fails with -EINVAL if input parsing
> +		fails or input error code doesn't match.
> 
>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
>  Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
>  Description:	Read-Write. Read this file for errors detected on pcie1 link.
> -		Write this file to clear errors logged in pcie1_errors. Write
> -		fails with -EINVAL if input parsing fails or input error code
> -		doesn't match.
> +		The readout value has embedded 4 bits revision attribution
> +		in Bit[59:56] which reserved by hardware. Write this file to
> +		clear errors logged in pcie1_errors. Clean Bit[59:56] before
> +		write this file. Write fails with -EINVAL if input parsing fails
> +		or input error code doesn't match.
> 
>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
>  Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
> -Description:	Read-only. It returns non-fatal errors detected.
> +Description:	Read-only. It returns non-fatal errors detected. The readout
> +		value has embedded 4 bits revision attribution in Bit[59:56]
> +		which reserved by hardware.
> 
>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
>  Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
>  Description:	Read-only. It returns catastrophic and fatal errors detected.
> +		The readout value has embedded 4 bits revision attribution in
> +		Bit[59:56] which reserved by hardware.
> 
>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/inject_errors
>  Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
> -Description:	Read-Write. Read this file to check errors injected. Write this
> -		file to inject errors for testing purpose. Write fails with
> -		-EINVAL if input parsing fails or input inject error code isn't
> -		supported.
> +Description:	Read-Write. Read this file to check errors injected. The readout
> +		value has embedded 4 bits revision attribution which reserved
> by
> +		hardware. Write this file to inject errors for testing purpose.
> +		Clean Bit[59:56] before write this file. Write fails with -EINVAL
> +		if input parsing fails or input inject error code isn't supported.
> 
>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/fme_errors
>  Date:		August 2019
>  KernelVersion:  5.4
>  Contact:	Wu Hao <hao.wu@intel.com>
> -Description:	Read-Write. Read this file to get errors detected on FME.
> -		Write this file to clear errors logged in fme_errors. Write
> +Description:	Read-Write. Read this file to get errors detected on FME. The
> +		readout value has embedded 4 bits revision attribution which
> +		reserved by hardware. Write this file to clear errors logged
> +		in fme_errors. Clean Bit[59:56] before write this file. Write
>  		fials with -EINVAL if input parsing fails or input error code
>  		doesn't match.
> 
> diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
> index 51c2892ec06d..a440bc09938d 100644
> --- a/drivers/fpga/dfl-fme-error.c
> +++ b/drivers/fpga/dfl-fme-error.c
> @@ -39,6 +39,22 @@
> 
>  #define ERROR_MASK		GENMASK_ULL(63, 0)
> 
> +/* Bit[59:56] was reserved by software for error revision */
> +#define ERROR_SW_REVISION_MASK GENMASK_ULL(59, 56)
> +
> +static u64 set_error_revision(struct device *dev, u64 value)
> +{
> +	void __iomem *base;
> +	u64 revision;
> +	u64 dfh;
> +
> +	base = dfl_get_feature_ioaddr_by_id(dev,
> FME_FEATURE_ID_GLOBAL_ERR);
> +	dfh = readq(base);
> +	revision = FIELD_GET(DFH_REVISION, dfh);
> +
> +	return value | FIELD_PREP(ERROR_SW_REVISION_MASK, revision);
> +}
> +
>  static ssize_t pcie0_errors_show(struct device *dev,
>  				 struct device_attribute *attr, char *buf)
>  {
> @@ -52,7 +68,8 @@ static ssize_t pcie0_errors_show(struct device *dev,
>  	value = readq(base + PCIE0_ERROR);
>  	mutex_unlock(&pdata->lock);
> 
> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> +	return sprintf(buf, "0x%llx\n",
> +		       (unsigned long long)set_error_revision(dev, value));
>  }
> 
>  static ssize_t pcie0_errors_store(struct device *dev,
> @@ -97,7 +114,8 @@ static ssize_t pcie1_errors_show(struct device *dev,
>  	value = readq(base + PCIE1_ERROR);
>  	mutex_unlock(&pdata->lock);
> 
> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> +	return sprintf(buf, "0x%llx\n",
> +		       (unsigned long long)set_error_revision(dev, value));
>  }
> 
>  static ssize_t pcie1_errors_store(struct device *dev,
> @@ -133,11 +151,13 @@ static ssize_t nonfatal_errors_show(struct device
> *dev,
>  				    struct device_attribute *attr, char *buf)
>  {
>  	void __iomem *base;
> +	u64 value;
> 
>  	base = dfl_get_feature_ioaddr_by_id(dev,
> FME_FEATURE_ID_GLOBAL_ERR);
> +	value = readq(base + RAS_NONFAT_ERROR);
> 
>  	return sprintf(buf, "0x%llx\n",
> -		       (unsigned long long)readq(base + RAS_NONFAT_ERROR));
> +		       (unsigned long long)set_error_revision(dev, value));
>  }
>  static DEVICE_ATTR_RO(nonfatal_errors);
> 
> @@ -145,11 +165,13 @@ static ssize_t catfatal_errors_show(struct device *dev,
>  				    struct device_attribute *attr, char *buf)
>  {
>  	void __iomem *base;
> +	u64 value;
> 
>  	base = dfl_get_feature_ioaddr_by_id(dev,
> FME_FEATURE_ID_GLOBAL_ERR);
> +	value = readq(base + RAS_CATFAT_ERROR);
> 
>  	return sprintf(buf, "0x%llx\n",
> -		       (unsigned long long)readq(base + RAS_CATFAT_ERROR));
> +		       (unsigned long long)set_error_revision(dev, value));
>  }
>  static DEVICE_ATTR_RO(catfatal_errors);
> 
> @@ -165,9 +187,10 @@ static ssize_t inject_errors_show(struct device *dev,
>  	mutex_lock(&pdata->lock);
>  	v = readq(base + RAS_ERROR_INJECT);
>  	mutex_unlock(&pdata->lock);
> +	v = FIELD_GET(INJECT_ERROR_MASK, v);
> 
>  	return sprintf(buf, "0x%llx\n",
> -		       (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
> +		       (unsigned long long)set_error_revision(dev, v));
>  }
> 
>  static ssize_t inject_errors_store(struct device *dev,
> @@ -211,7 +234,8 @@ static ssize_t fme_errors_show(struct device *dev,
>  	value = readq(base + FME_ERROR);
>  	mutex_unlock(&pdata->lock);
> 
> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> +	return sprintf(buf, "0x%llx\n",
> +		       (unsigned long long)set_error_revision(dev, value));
>  }
> 
>  static ssize_t fme_errors_store(struct device *dev,
> --
> 2.26.2
Russ Weight July 26, 2022, 11:46 p.m. UTC | #2
On 5/29/22 19:35, Wu, Hao wrote:
>> Subject: [PATCH v3] fpga: dfl: fme: adding reserved bits for revision of FME/Port
>> error
>>
>> From: Tianfei zhang <tianfei.zhang@intel.com>
>>
>> There are 2 different register layouts for FME/Port error
>> registers. The older PAC card like N3000 used the older
>> register layout, but to improve the scalability, the new
>> production like Intel PAC N6000 plans to deploy the new
>> register format.
>>
>> To distinguish the register layouts, we provide another sysfs
>> node for revision info, but it is a bad method that using one
>> sysfs node's value to determine the usage of other sysfs node.
>>
>> This patch introduces 4 reserved bits (Bit[59:56]) in error
>> register to store the revision value which readout from
>> DFH_REVISION field in DFH register, the DFH_REVSION field also
>> 4 bits width. DFL driver appends the FME/Port error
>> revision info in those bits for attribution on the readout
>> value.
> My current feeling is that the essential requirement now, is
> that userspace requires to know what the error types on the
> current FPGA card, as FPGA logic can be customized by different
> user. Ideally it should expose something like GUID to userspace,
> but now there is no such design, if GUID design could be finalized
> then we can just expose that info via GUID sysfs interface, if this
> is not possible, then I feel even revision sysfs interface is better
> than this kind of hack.

Yes - I was thinking that these registers provided a single error
value which could be augmented with the revision number to ensure
unique values across boards with different implementations. However,
embedding the revision number isn't helpful for a bitfield.

Can you elaborate on the GUID design? Wouldn't this create a situation
where you have to read the GUID to know how to interpret the error
register? If so, how is this different or better than reading a
revision number to know how to interpret the error register?

Maybe I am misunderstanding the problem or the suggestion?

Thanks,
- Russ
>
>> Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
>> ---
>> v3:
>>  - no code change, just add some explanation of two register layouts in git
>> message.
>> v2:
>>  - add documentation for this change.
>>  - fix the reverse xmas tree declaration.
>> ---
>>  .../ABI/testing/sysfs-platform-dfl-fme        | 37 ++++++++++++-------
>>  drivers/fpga/dfl-fme-error.c                  | 36 +++++++++++++++---
>>  2 files changed, 54 insertions(+), 19 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme
>> b/Documentation/ABI/testing/sysfs-platform-dfl-fme
>> index d6ab34e81b9b..b886568d6071 100644
>> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
>> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
>> @@ -50,46 +50,57 @@ Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>>  Description:	Read-Write. Read this file for errors detected on pcie0 link.
>> -		Write this file to clear errors logged in pcie0_errors. Write
>> -		fails with -EINVAL if input parsing fails or input error code
>> -		doesn't match.
>> +		The readout value has embedded 4 bits revision attribution
>> +		in Bit[59:56] which reserved by hardware. Write this file to
>> +		clear errors logged in pcie0_errors. Clean Bit[59:56] before
>> +		write this file. Write fails with -EINVAL if input parsing
>> +		fails or input error code doesn't match.
>>
>>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
>>  Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>>  Description:	Read-Write. Read this file for errors detected on pcie1 link.
>> -		Write this file to clear errors logged in pcie1_errors. Write
>> -		fails with -EINVAL if input parsing fails or input error code
>> -		doesn't match.
>> +		The readout value has embedded 4 bits revision attribution
>> +		in Bit[59:56] which reserved by hardware. Write this file to
>> +		clear errors logged in pcie1_errors. Clean Bit[59:56] before
>> +		write this file. Write fails with -EINVAL if input parsing fails
>> +		or input error code doesn't match.
>>
>>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
>>  Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>> -Description:	Read-only. It returns non-fatal errors detected.
>> +Description:	Read-only. It returns non-fatal errors detected. The readout
>> +		value has embedded 4 bits revision attribution in Bit[59:56]
>> +		which reserved by hardware.
>>
>>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
>>  Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>>  Description:	Read-only. It returns catastrophic and fatal errors detected.
>> +		The readout value has embedded 4 bits revision attribution in
>> +		Bit[59:56] which reserved by hardware.
>>
>>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/inject_errors
>>  Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>> -Description:	Read-Write. Read this file to check errors injected. Write this
>> -		file to inject errors for testing purpose. Write fails with
>> -		-EINVAL if input parsing fails or input inject error code isn't
>> -		supported.
>> +Description:	Read-Write. Read this file to check errors injected. The readout
>> +		value has embedded 4 bits revision attribution which reserved
>> by
>> +		hardware. Write this file to inject errors for testing purpose.
>> +		Clean Bit[59:56] before write this file. Write fails with -EINVAL
>> +		if input parsing fails or input inject error code isn't supported.
>>
>>  What:		/sys/bus/platform/devices/dfl-fme.0/errors/fme_errors
>>  Date:		August 2019
>>  KernelVersion:  5.4
>>  Contact:	Wu Hao <hao.wu@intel.com>
>> -Description:	Read-Write. Read this file to get errors detected on FME.
>> -		Write this file to clear errors logged in fme_errors. Write
>> +Description:	Read-Write. Read this file to get errors detected on FME. The
>> +		readout value has embedded 4 bits revision attribution which
>> +		reserved by hardware. Write this file to clear errors logged
>> +		in fme_errors. Clean Bit[59:56] before write this file. Write
>>  		fials with -EINVAL if input parsing fails or input error code
>>  		doesn't match.
>>
>> diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
>> index 51c2892ec06d..a440bc09938d 100644
>> --- a/drivers/fpga/dfl-fme-error.c
>> +++ b/drivers/fpga/dfl-fme-error.c
>> @@ -39,6 +39,22 @@
>>
>>  #define ERROR_MASK		GENMASK_ULL(63, 0)
>>
>> +/* Bit[59:56] was reserved by software for error revision */
>> +#define ERROR_SW_REVISION_MASK GENMASK_ULL(59, 56)
>> +
>> +static u64 set_error_revision(struct device *dev, u64 value)
>> +{
>> +	void __iomem *base;
>> +	u64 revision;
>> +	u64 dfh;
>> +
>> +	base = dfl_get_feature_ioaddr_by_id(dev,
>> FME_FEATURE_ID_GLOBAL_ERR);
>> +	dfh = readq(base);
>> +	revision = FIELD_GET(DFH_REVISION, dfh);
>> +
>> +	return value | FIELD_PREP(ERROR_SW_REVISION_MASK, revision);
>> +}
>> +
>>  static ssize_t pcie0_errors_show(struct device *dev,
>>  				 struct device_attribute *attr, char *buf)
>>  {
>> @@ -52,7 +68,8 @@ static ssize_t pcie0_errors_show(struct device *dev,
>>  	value = readq(base + PCIE0_ERROR);
>>  	mutex_unlock(&pdata->lock);
>>
>> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
>> +	return sprintf(buf, "0x%llx\n",
>> +		       (unsigned long long)set_error_revision(dev, value));
>>  }
>>
>>  static ssize_t pcie0_errors_store(struct device *dev,
>> @@ -97,7 +114,8 @@ static ssize_t pcie1_errors_show(struct device *dev,
>>  	value = readq(base + PCIE1_ERROR);
>>  	mutex_unlock(&pdata->lock);
>>
>> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
>> +	return sprintf(buf, "0x%llx\n",
>> +		       (unsigned long long)set_error_revision(dev, value));
>>  }
>>
>>  static ssize_t pcie1_errors_store(struct device *dev,
>> @@ -133,11 +151,13 @@ static ssize_t nonfatal_errors_show(struct device
>> *dev,
>>  				    struct device_attribute *attr, char *buf)
>>  {
>>  	void __iomem *base;
>> +	u64 value;
>>
>>  	base = dfl_get_feature_ioaddr_by_id(dev,
>> FME_FEATURE_ID_GLOBAL_ERR);
>> +	value = readq(base + RAS_NONFAT_ERROR);
>>
>>  	return sprintf(buf, "0x%llx\n",
>> -		       (unsigned long long)readq(base + RAS_NONFAT_ERROR));
>> +		       (unsigned long long)set_error_revision(dev, value));
>>  }
>>  static DEVICE_ATTR_RO(nonfatal_errors);
>>
>> @@ -145,11 +165,13 @@ static ssize_t catfatal_errors_show(struct device *dev,
>>  				    struct device_attribute *attr, char *buf)
>>  {
>>  	void __iomem *base;
>> +	u64 value;
>>
>>  	base = dfl_get_feature_ioaddr_by_id(dev,
>> FME_FEATURE_ID_GLOBAL_ERR);
>> +	value = readq(base + RAS_CATFAT_ERROR);
>>
>>  	return sprintf(buf, "0x%llx\n",
>> -		       (unsigned long long)readq(base + RAS_CATFAT_ERROR));
>> +		       (unsigned long long)set_error_revision(dev, value));
>>  }
>>  static DEVICE_ATTR_RO(catfatal_errors);
>>
>> @@ -165,9 +187,10 @@ static ssize_t inject_errors_show(struct device *dev,
>>  	mutex_lock(&pdata->lock);
>>  	v = readq(base + RAS_ERROR_INJECT);
>>  	mutex_unlock(&pdata->lock);
>> +	v = FIELD_GET(INJECT_ERROR_MASK, v);
>>
>>  	return sprintf(buf, "0x%llx\n",
>> -		       (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
>> +		       (unsigned long long)set_error_revision(dev, v));
>>  }
>>
>>  static ssize_t inject_errors_store(struct device *dev,
>> @@ -211,7 +234,8 @@ static ssize_t fme_errors_show(struct device *dev,
>>  	value = readq(base + FME_ERROR);
>>  	mutex_unlock(&pdata->lock);
>>
>> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
>> +	return sprintf(buf, "0x%llx\n",
>> +		       (unsigned long long)set_error_revision(dev, value));
>>  }
>>
>>  static ssize_t fme_errors_store(struct device *dev,
>> --
>> 2.26.2
Wu, Hao Aug. 2, 2022, 7:07 a.m. UTC | #3
> On 5/29/22 19:35, Wu, Hao wrote:
> >> Subject: [PATCH v3] fpga: dfl: fme: adding reserved bits for revision of
> FME/Port
> >> error
> >>
> >> From: Tianfei zhang <tianfei.zhang@intel.com>
> >>
> >> There are 2 different register layouts for FME/Port error
> >> registers. The older PAC card like N3000 used the older
> >> register layout, but to improve the scalability, the new
> >> production like Intel PAC N6000 plans to deploy the new
> >> register format.
> >>
> >> To distinguish the register layouts, we provide another sysfs
> >> node for revision info, but it is a bad method that using one
> >> sysfs node's value to determine the usage of other sysfs node.
> >>
> >> This patch introduces 4 reserved bits (Bit[59:56]) in error
> >> register to store the revision value which readout from
> >> DFH_REVISION field in DFH register, the DFH_REVSION field also
> >> 4 bits width. DFL driver appends the FME/Port error
> >> revision info in those bits for attribution on the readout
> >> value.
> > My current feeling is that the essential requirement now, is
> > that userspace requires to know what the error types on the
> > current FPGA card, as FPGA logic can be customized by different
> > user. Ideally it should expose something like GUID to userspace,
> > but now there is no such design, if GUID design could be finalized
> > then we can just expose that info via GUID sysfs interface, if this
> > is not possible, then I feel even revision sysfs interface is better
> > than this kind of hack.
> 
> Yes - I was thinking that these registers provided a single error
> value which could be augmented with the revision number to ensure
> unique values across boards with different implementations. However,
> embedding the revision number isn't helpful for a bitfield.
> 
> Can you elaborate on the GUID design? Wouldn't this create a situation
> where you have to read the GUID to know how to interpret the error
> register? If so, how is this different or better than reading a
> revision number to know how to interpret the error register?
> 
> Maybe I am misunderstanding the problem or the suggestion?

Similar as AFU, user read GUID to know what's inside AFU, but
userspace interface to access AFU is not changed. For FME/Port
Error, user read GUID to know which errors are implemented, but
userspace interface to access these errors is not changed.
GUID should be better if we consider long term, revision is too
small. And so far I am not sure if revision will be used for other
purpose or not which can make things more complicated to
maintain. So dedicated GUID for this purpose sounds a better
solution. 

Thanks
Hao

> 
> Thanks,
> - Russ
> >
> >> Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
> >> ---
> >> v3:
> >>  - no code change, just add some explanation of two register layouts in git
> >> message.
> >> v2:
> >>  - add documentation for this change.
> >>  - fix the reverse xmas tree declaration.
> >> ---
> >>  .../ABI/testing/sysfs-platform-dfl-fme        | 37 ++++++++++++-------
> >>  drivers/fpga/dfl-fme-error.c                  | 36 +++++++++++++++---
> >>  2 files changed, 54 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> >> b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> >> index d6ab34e81b9b..b886568d6071 100644
> >> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> >> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> >> @@ -50,46 +50,57 @@ Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >>  Description:	Read-Write. Read this file for errors detected on pcie0 link.
> >> -		Write this file to clear errors logged in pcie0_errors. Write
> >> -		fails with -EINVAL if input parsing fails or input error code
> >> -		doesn't match.
> >> +		The readout value has embedded 4 bits revision attribution
> >> +		in Bit[59:56] which reserved by hardware. Write this file to
> >> +		clear errors logged in pcie0_errors. Clean Bit[59:56] before
> >> +		write this file. Write fails with -EINVAL if input parsing
> >> +		fails or input error code doesn't match.
> >>
> >>  What:		/sys/bus/platform/devices/dfl-
> fme.0/errors/pcie1_errors
> >>  Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >>  Description:	Read-Write. Read this file for errors detected on pcie1 link.
> >> -		Write this file to clear errors logged in pcie1_errors. Write
> >> -		fails with -EINVAL if input parsing fails or input error code
> >> -		doesn't match.
> >> +		The readout value has embedded 4 bits revision attribution
> >> +		in Bit[59:56] which reserved by hardware. Write this file to
> >> +		clear errors logged in pcie1_errors. Clean Bit[59:56] before
> >> +		write this file. Write fails with -EINVAL if input parsing fails
> >> +		or input error code doesn't match.
> >>
> >>  What:		/sys/bus/platform/devices/dfl-
> fme.0/errors/nonfatal_errors
> >>  Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >> -Description:	Read-only. It returns non-fatal errors detected.
> >> +Description:	Read-only. It returns non-fatal errors detected. The
> readout
> >> +		value has embedded 4 bits revision attribution in Bit[59:56]
> >> +		which reserved by hardware.
> >>
> >>  What:		/sys/bus/platform/devices/dfl-
> fme.0/errors/catfatal_errors
> >>  Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >>  Description:	Read-only. It returns catastrophic and fatal errors detected.
> >> +		The readout value has embedded 4 bits revision attribution
> in
> >> +		Bit[59:56] which reserved by hardware.
> >>
> >>  What:		/sys/bus/platform/devices/dfl-
> fme.0/errors/inject_errors
> >>  Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >> -Description:	Read-Write. Read this file to check errors injected.
> Write this
> >> -		file to inject errors for testing purpose. Write fails with
> >> -		-EINVAL if input parsing fails or input inject error code isn't
> >> -		supported.
> >> +Description:	Read-Write. Read this file to check errors injected.
> The readout
> >> +		value has embedded 4 bits revision attribution which
> reserved
> >> by
> >> +		hardware. Write this file to inject errors for testing purpose.
> >> +		Clean Bit[59:56] before write this file. Write fails with -EINVAL
> >> +		if input parsing fails or input inject error code isn't supported.
> >>
> >>  What:		/sys/bus/platform/devices/dfl-
> fme.0/errors/fme_errors
> >>  Date:		August 2019
> >>  KernelVersion:  5.4
> >>  Contact:	Wu Hao <hao.wu@intel.com>
> >> -Description:	Read-Write. Read this file to get errors detected on
> FME.
> >> -		Write this file to clear errors logged in fme_errors. Write
> >> +Description:	Read-Write. Read this file to get errors detected on
> FME. The
> >> +		readout value has embedded 4 bits revision attribution which
> >> +		reserved by hardware. Write this file to clear errors logged
> >> +		in fme_errors. Clean Bit[59:56] before write this file. Write
> >>  		fials with -EINVAL if input parsing fails or input error code
> >>  		doesn't match.
> >>
> >> diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
> >> index 51c2892ec06d..a440bc09938d 100644
> >> --- a/drivers/fpga/dfl-fme-error.c
> >> +++ b/drivers/fpga/dfl-fme-error.c
> >> @@ -39,6 +39,22 @@
> >>
> >>  #define ERROR_MASK		GENMASK_ULL(63, 0)
> >>
> >> +/* Bit[59:56] was reserved by software for error revision */
> >> +#define ERROR_SW_REVISION_MASK GENMASK_ULL(59, 56)
> >> +
> >> +static u64 set_error_revision(struct device *dev, u64 value)
> >> +{
> >> +	void __iomem *base;
> >> +	u64 revision;
> >> +	u64 dfh;
> >> +
> >> +	base = dfl_get_feature_ioaddr_by_id(dev,
> >> FME_FEATURE_ID_GLOBAL_ERR);
> >> +	dfh = readq(base);
> >> +	revision = FIELD_GET(DFH_REVISION, dfh);
> >> +
> >> +	return value | FIELD_PREP(ERROR_SW_REVISION_MASK, revision);
> >> +}
> >> +
> >>  static ssize_t pcie0_errors_show(struct device *dev,
> >>  				 struct device_attribute *attr, char *buf)
> >>  {
> >> @@ -52,7 +68,8 @@ static ssize_t pcie0_errors_show(struct device *dev,
> >>  	value = readq(base + PCIE0_ERROR);
> >>  	mutex_unlock(&pdata->lock);
> >>
> >> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> >> +	return sprintf(buf, "0x%llx\n",
> >> +		       (unsigned long long)set_error_revision(dev, value));
> >>  }
> >>
> >>  static ssize_t pcie0_errors_store(struct device *dev,
> >> @@ -97,7 +114,8 @@ static ssize_t pcie1_errors_show(struct device *dev,
> >>  	value = readq(base + PCIE1_ERROR);
> >>  	mutex_unlock(&pdata->lock);
> >>
> >> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> >> +	return sprintf(buf, "0x%llx\n",
> >> +		       (unsigned long long)set_error_revision(dev, value));
> >>  }
> >>
> >>  static ssize_t pcie1_errors_store(struct device *dev,
> >> @@ -133,11 +151,13 @@ static ssize_t nonfatal_errors_show(struct
> device
> >> *dev,
> >>  				    struct device_attribute *attr, char *buf)
> >>  {
> >>  	void __iomem *base;
> >> +	u64 value;
> >>
> >>  	base = dfl_get_feature_ioaddr_by_id(dev,
> >> FME_FEATURE_ID_GLOBAL_ERR);
> >> +	value = readq(base + RAS_NONFAT_ERROR);
> >>
> >>  	return sprintf(buf, "0x%llx\n",
> >> -		       (unsigned long long)readq(base + RAS_NONFAT_ERROR));
> >> +		       (unsigned long long)set_error_revision(dev, value));
> >>  }
> >>  static DEVICE_ATTR_RO(nonfatal_errors);
> >>
> >> @@ -145,11 +165,13 @@ static ssize_t catfatal_errors_show(struct device
> *dev,
> >>  				    struct device_attribute *attr, char *buf)
> >>  {
> >>  	void __iomem *base;
> >> +	u64 value;
> >>
> >>  	base = dfl_get_feature_ioaddr_by_id(dev,
> >> FME_FEATURE_ID_GLOBAL_ERR);
> >> +	value = readq(base + RAS_CATFAT_ERROR);
> >>
> >>  	return sprintf(buf, "0x%llx\n",
> >> -		       (unsigned long long)readq(base + RAS_CATFAT_ERROR));
> >> +		       (unsigned long long)set_error_revision(dev, value));
> >>  }
> >>  static DEVICE_ATTR_RO(catfatal_errors);
> >>
> >> @@ -165,9 +187,10 @@ static ssize_t inject_errors_show(struct device
> *dev,
> >>  	mutex_lock(&pdata->lock);
> >>  	v = readq(base + RAS_ERROR_INJECT);
> >>  	mutex_unlock(&pdata->lock);
> >> +	v = FIELD_GET(INJECT_ERROR_MASK, v);
> >>
> >>  	return sprintf(buf, "0x%llx\n",
> >> -		       (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
> >> +		       (unsigned long long)set_error_revision(dev, v));
> >>  }
> >>
> >>  static ssize_t inject_errors_store(struct device *dev,
> >> @@ -211,7 +234,8 @@ static ssize_t fme_errors_show(struct device *dev,
> >>  	value = readq(base + FME_ERROR);
> >>  	mutex_unlock(&pdata->lock);
> >>
> >> -	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
> >> +	return sprintf(buf, "0x%llx\n",
> >> +		       (unsigned long long)set_error_revision(dev, value));
> >>  }
> >>
> >>  static ssize_t fme_errors_store(struct device *dev,
> >> --
> >> 2.26.2
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index d6ab34e81b9b..b886568d6071 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -50,46 +50,57 @@  Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
 Description:	Read-Write. Read this file for errors detected on pcie0 link.
-		Write this file to clear errors logged in pcie0_errors. Write
-		fails with -EINVAL if input parsing fails or input error code
-		doesn't match.
+		The readout value has embedded 4 bits revision attribution
+		in Bit[59:56] which reserved by hardware. Write this file to
+		clear errors logged in pcie0_errors. Clean Bit[59:56] before
+		write this file. Write fails with -EINVAL if input parsing
+		fails or input error code doesn't match.
 
 What:		/sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
 Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
 Description:	Read-Write. Read this file for errors detected on pcie1 link.
-		Write this file to clear errors logged in pcie1_errors. Write
-		fails with -EINVAL if input parsing fails or input error code
-		doesn't match.
+		The readout value has embedded 4 bits revision attribution
+		in Bit[59:56] which reserved by hardware. Write this file to
+		clear errors logged in pcie1_errors. Clean Bit[59:56] before
+		write this file. Write fails with -EINVAL if input parsing fails
+		or input error code doesn't match.
 
 What:		/sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
 Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
-Description:	Read-only. It returns non-fatal errors detected.
+Description:	Read-only. It returns non-fatal errors detected. The readout
+		value has embedded 4 bits revision attribution in Bit[59:56]
+		which reserved by hardware.
 
 What:		/sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
 Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
 Description:	Read-only. It returns catastrophic and fatal errors detected.
+		The readout value has embedded 4 bits revision attribution in
+		Bit[59:56] which reserved by hardware.
 
 What:		/sys/bus/platform/devices/dfl-fme.0/errors/inject_errors
 Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
-Description:	Read-Write. Read this file to check errors injected. Write this
-		file to inject errors for testing purpose. Write fails with
-		-EINVAL if input parsing fails or input inject error code isn't
-		supported.
+Description:	Read-Write. Read this file to check errors injected. The readout
+		value has embedded 4 bits revision attribution which reserved by
+		hardware. Write this file to inject errors for testing purpose.
+		Clean Bit[59:56] before write this file. Write fails with -EINVAL
+		if input parsing fails or input inject error code isn't supported.
 
 What:		/sys/bus/platform/devices/dfl-fme.0/errors/fme_errors
 Date:		August 2019
 KernelVersion:  5.4
 Contact:	Wu Hao <hao.wu@intel.com>
-Description:	Read-Write. Read this file to get errors detected on FME.
-		Write this file to clear errors logged in fme_errors. Write
+Description:	Read-Write. Read this file to get errors detected on FME. The
+		readout value has embedded 4 bits revision attribution which
+		reserved by hardware. Write this file to clear errors logged
+		in fme_errors. Clean Bit[59:56] before write this file. Write
 		fials with -EINVAL if input parsing fails or input error code
 		doesn't match.
 
diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
index 51c2892ec06d..a440bc09938d 100644
--- a/drivers/fpga/dfl-fme-error.c
+++ b/drivers/fpga/dfl-fme-error.c
@@ -39,6 +39,22 @@ 
 
 #define ERROR_MASK		GENMASK_ULL(63, 0)
 
+/* Bit[59:56] was reserved by software for error revision */
+#define ERROR_SW_REVISION_MASK GENMASK_ULL(59, 56)
+
+static u64 set_error_revision(struct device *dev, u64 value)
+{
+	void __iomem *base;
+	u64 revision;
+	u64 dfh;
+
+	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
+	dfh = readq(base);
+	revision = FIELD_GET(DFH_REVISION, dfh);
+
+	return value | FIELD_PREP(ERROR_SW_REVISION_MASK, revision);
+}
+
 static ssize_t pcie0_errors_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
@@ -52,7 +68,8 @@  static ssize_t pcie0_errors_show(struct device *dev,
 	value = readq(base + PCIE0_ERROR);
 	mutex_unlock(&pdata->lock);
 
-	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
+	return sprintf(buf, "0x%llx\n",
+		       (unsigned long long)set_error_revision(dev, value));
 }
 
 static ssize_t pcie0_errors_store(struct device *dev,
@@ -97,7 +114,8 @@  static ssize_t pcie1_errors_show(struct device *dev,
 	value = readq(base + PCIE1_ERROR);
 	mutex_unlock(&pdata->lock);
 
-	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
+	return sprintf(buf, "0x%llx\n",
+		       (unsigned long long)set_error_revision(dev, value));
 }
 
 static ssize_t pcie1_errors_store(struct device *dev,
@@ -133,11 +151,13 @@  static ssize_t nonfatal_errors_show(struct device *dev,
 				    struct device_attribute *attr, char *buf)
 {
 	void __iomem *base;
+	u64 value;
 
 	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
+	value = readq(base + RAS_NONFAT_ERROR);
 
 	return sprintf(buf, "0x%llx\n",
-		       (unsigned long long)readq(base + RAS_NONFAT_ERROR));
+		       (unsigned long long)set_error_revision(dev, value));
 }
 static DEVICE_ATTR_RO(nonfatal_errors);
 
@@ -145,11 +165,13 @@  static ssize_t catfatal_errors_show(struct device *dev,
 				    struct device_attribute *attr, char *buf)
 {
 	void __iomem *base;
+	u64 value;
 
 	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
+	value = readq(base + RAS_CATFAT_ERROR);
 
 	return sprintf(buf, "0x%llx\n",
-		       (unsigned long long)readq(base + RAS_CATFAT_ERROR));
+		       (unsigned long long)set_error_revision(dev, value));
 }
 static DEVICE_ATTR_RO(catfatal_errors);
 
@@ -165,9 +187,10 @@  static ssize_t inject_errors_show(struct device *dev,
 	mutex_lock(&pdata->lock);
 	v = readq(base + RAS_ERROR_INJECT);
 	mutex_unlock(&pdata->lock);
+	v = FIELD_GET(INJECT_ERROR_MASK, v);
 
 	return sprintf(buf, "0x%llx\n",
-		       (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
+		       (unsigned long long)set_error_revision(dev, v));
 }
 
 static ssize_t inject_errors_store(struct device *dev,
@@ -211,7 +234,8 @@  static ssize_t fme_errors_show(struct device *dev,
 	value = readq(base + FME_ERROR);
 	mutex_unlock(&pdata->lock);
 
-	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
+	return sprintf(buf, "0x%llx\n",
+		       (unsigned long long)set_error_revision(dev, value));
 }
 
 static ssize_t fme_errors_store(struct device *dev,