fpga: altera-cvp: allow interrupt to continue next time

Message ID	20220518073844.2713722-1-tien.sung.ang@intel.com (mailing list archive)
State	New
Headers	show Return-Path: <linux-fpga-owner@kernel.org> From: tien.sung.ang@intel.com To: mdf@kernel.org, hao.wu@intel.com, yilun.xu@intel.com, trix@redhat.com Cc: linux-fpga@vger.kernel.org, linux-kernel@vger.kernel.org, Dinh Nguyen <dinh.nguyen@intel.com>, Ang Tien Sung <tien.sung.ang@intel.com> Subject: [PATCH] fpga: altera-cvp: allow interrupt to continue next time Date: Wed, 18 May 2022 15:38:44 +0800 Message-Id: <20220518073844.2713722-1-tien.sung.ang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	fpga: altera-cvp: allow interrupt to continue next time \| expand fpga: altera-cvp: allow interrupt to continue next time

Ang, Tien Sung May 18, 2022, 7:38 a.m. UTC

From: Dinh Nguyen <dinh.nguyen@intel.com>

CFG_READY signal/bit may time-out due to firmware not responding
within the given time-out. This time varies due to numerous
factors like size of bitstream and others.
This time-out error does not impact the result of the CvP
previous transactions. The CvP driver shall then, respond with
EAGAIN instead Time out error.

Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
---
 drivers/fpga/altera-cvp.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Tom Rix May 18, 2022, 2:04 p.m. UTC | #1

On 5/18/22 12:38 AM, tien.sung.ang@intel.com wrote:
> From: Dinh Nguyen <dinh.nguyen@intel.com>
>
> CFG_READY signal/bit may time-out due to firmware not responding
> within the given time-out. This time varies due to numerous
> factors like size of bitstream and others.
> This time-out error does not impact the result of the CvP
> previous transactions. The CvP driver shall then, respond with
> EAGAIN instead Time out error.
>
> Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
> Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
> ---
>   drivers/fpga/altera-cvp.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 4ffb9da537d8..d74ff63c61e8 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
>   	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
>   	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
>   				     conf->priv->poll_time_us);
> -	if (ret)
> +	if (ret) {
>   		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
> +		goto error_path;
> +	}
>   
>   	return ret;
> +
> +error_path:
> +	/* reset CVP_MODE and HIP_CLK_SEL bit */
> +	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
> +	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
> +	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
> +	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
> +
> +	return -EAGAIN;

This will set fpga_mgr->state to *_ERR.

Is this ok or do you think we need a couple new of *_BUSY enums ?

Tom

> +
>   }
>   
>   static int altera_cvp_write_init(struct fpga_manager *mgr,

Ang, Tien Sung May 19, 2022, 9:39 a.m. UTC | #2

Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this
as an error irrespective of the value. The CvP driver is changed now to just
indicate the correct error which recommends a retry. To me understanding,
EAGAIN was this. The fpga manager now looks like is going to return a CvP
failure in short. 
A BUSY state does not seem to be able to solve this issue. 
Even an extended time-out didn't resolve this error state. The current time-out
is set to 10seconds.   
However, the main objective is to also handle the error if the CvP firmware
is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit
as recommended by the firmware engineers. 
The flow prescribed here is also an identical copy of working CvP driver 
which is also owned by Intel. This driver is a downstream driver which is 
not part of the Linux kernel. We are now porting this differences over to 
the current upstream CvP driver.

Xu Yilun May 28, 2022, 12:04 p.m. UTC | #3

On Wed, May 18, 2022 at 03:38:44PM +0800, tien.sung.ang@intel.com wrote:
> From: Dinh Nguyen <dinh.nguyen@intel.com>
> 
> CFG_READY signal/bit may time-out due to firmware not responding
> within the given time-out. This time varies due to numerous
> factors like size of bitstream and others.
> This time-out error does not impact the result of the CvP
> previous transactions. The CvP driver shall then, respond with

Do you mean the reprogramming is successful even if you find the time
out in write_complete()? Then return 0 is better?

And could you specify what the time-out mean on write_init() phase?

Thanks,
Yilun

> EAGAIN instead Time out error.
> 
> Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
> Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
> ---
>  drivers/fpga/altera-cvp.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 4ffb9da537d8..d74ff63c61e8 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
>  	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
>  	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
>  				     conf->priv->poll_time_us);
> -	if (ret)
> +	if (ret) {
>  		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
> +		goto error_path;
> +	}
>  
>  	return ret;
> +
> +error_path:
> +	/* reset CVP_MODE and HIP_CLK_SEL bit */
> +	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
> +	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
> +	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
> +	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
> +
> +	return -EAGAIN;
> +
>  }
>  
>  static int altera_cvp_write_init(struct fpga_manager *mgr,
> -- 
> 2.25.1

Xu Yilun May 28, 2022, 12:05 p.m. UTC | #4

On Thu, May 19, 2022 at 05:39:07PM +0800, tien.sung.ang@intel.com wrote:
> Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this
> as an error irrespective of the value. The CvP driver is changed now to just
> indicate the correct error which recommends a retry. To me understanding,
> EAGAIN was this. The fpga manager now looks like is going to return a CvP
> failure in short. 
> A BUSY state does not seem to be able to solve this issue. 
> Even an extended time-out didn't resolve this error state. The current time-out
> is set to 10seconds.   
> However, the main objective is to also handle the error if the CvP firmware
> is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit

Please add your main objective to commit message.

Thanks,
Yilun

> as recommended by the firmware engineers. 
> The flow prescribed here is also an identical copy of working CvP driver 
> which is also owned by Intel. This driver is a downstream driver which is 
> not part of the Linux kernel. We are now porting this differences over to 
> the current upstream CvP driver.

Ang, Tien Sung May 31, 2022, 2:20 a.m. UTC | #5

>> CFG_READY signal/bit may time-out due to firmware not responding
>> within the given time-out. This time varies due to numerous
>> factors like size of bitstream and others.
>> This time-out error does not impact the result of the CvP
>> previous transactions. The CvP driver shall then, respond with

>Do you mean the reprogramming is successful even if you find the time
>out in write_complete()? Then return 0 is better?
Based on the information given by the Intel FPGA firmware team,
CFG_READY is essential to indicate if the current FPGA 
configuration session is indeed a success. There are 
cases we test in the lab whereby, CFG_READY stays invalid and
the tests performed subsequently to verify the FPGA functionality
could not detect the failed session. A failed FPGA 
configuration session means, the new bitstream wasn't 
successfully configured and tests ran later will just be passing
on the previous working bitstream version. In short, CFG_READY
is esential, and an error indicating the time-out is a must.
Another example, using an incorrect SOF/Design FPGA results
in CFG_READY being invalid. The user must be informed of a 
potential error. 
I will correct the wordings i used earlier that says that
the timoeut error does not impact the results of the CvP
previous transactions. It may so if the firmware has some sort
of error. 

>And could you specify what the time-out mean on write_init() phase?
I could not really understand your question. We set huge 
time-outs of ~10seconds. Every wait for the firmware to respond
is potentially a hazard. The firmware CvP is has it's limitation
unfortunately.

Xu Yilun June 3, 2022, 10:14 a.m. UTC | #6

On Tue, May 31, 2022 at 10:20:04AM +0800, tien.sung.ang@intel.com wrote:
> >> CFG_READY signal/bit may time-out due to firmware not responding
> >> within the given time-out. This time varies due to numerous
> >> factors like size of bitstream and others.
> >> This time-out error does not impact the result of the CvP
> >> previous transactions. The CvP driver shall then, respond with
> 
> >Do you mean the reprogramming is successful even if you find the time
> >out in write_complete()? Then return 0 is better?
> Based on the information given by the Intel FPGA firmware team,
> CFG_READY is essential to indicate if the current FPGA 
> configuration session is indeed a success. There are 
> cases we test in the lab whereby, CFG_READY stays invalid and
> the tests performed subsequently to verify the FPGA functionality
> could not detect the failed session. A failed FPGA 
> configuration session means, the new bitstream wasn't 
> successfully configured and tests ran later will just be passing
> on the previous working bitstream version. In short, CFG_READY
> is esential, and an error indicating the time-out is a must.
> Another example, using an incorrect SOF/Design FPGA results
> in CFG_READY being invalid. The user must be informed of a 
> potential error. 
> I will correct the wordings i used earlier that says that
> the timoeut error does not impact the results of the CvP
> previous transactions. It may so if the firmware has some sort
> of error. 

Understood. But with your new comment why you must change the error
code to -EAGAIN rather than timeout?

I think you may change your commit message. The main change is adding
the error handling. The error code change is minor, even not necessary
if you don't have a strong reason.

Thanks,
Yilun

> 
> >And could you specify what the time-out mean on write_init() phase?
> I could not really understand your question. We set huge 
> time-outs of ~10seconds. Every wait for the firmware to respond
> is potentially a hazard. The firmware CvP is has it's limitation
> unfortunately.

fpga: altera-cvp: allow interrupt to continue next time

Commit Message

Comments

Patch