Message ID | 20220518073844.2713722-1-tien.sung.ang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fpga: altera-cvp: allow interrupt to continue next time | expand |
On 5/18/22 12:38 AM, tien.sung.ang@intel.com wrote: > From: Dinh Nguyen <dinh.nguyen@intel.com> > > CFG_READY signal/bit may time-out due to firmware not responding > within the given time-out. This time varies due to numerous > factors like size of bitstream and others. > This time-out error does not impact the result of the CvP > previous transactions. The CvP driver shall then, respond with > EAGAIN instead Time out error. > > Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com> > Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com> > --- > drivers/fpga/altera-cvp.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c > index 4ffb9da537d8..d74ff63c61e8 100644 > --- a/drivers/fpga/altera-cvp.c > +++ b/drivers/fpga/altera-cvp.c > @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr, > /* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */ > ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0, > conf->priv->poll_time_us); > - if (ret) > + if (ret) { > dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n"); > + goto error_path; > + } > > return ret; > + > +error_path: > + /* reset CVP_MODE and HIP_CLK_SEL bit */ > + altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val); > + val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL; > + val &= ~VSE_CVP_MODE_CTRL_CVP_MODE; > + altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val); > + > + return -EAGAIN; This will set fpga_mgr->state to *_ERR. Is this ok or do you think we need a couple new of *_BUSY enums ? Tom > + > } > > static int altera_cvp_write_init(struct fpga_manager *mgr,
Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this as an error irrespective of the value. The CvP driver is changed now to just indicate the correct error which recommends a retry. To me understanding, EAGAIN was this. The fpga manager now looks like is going to return a CvP failure in short. A BUSY state does not seem to be able to solve this issue. Even an extended time-out didn't resolve this error state. The current time-out is set to 10seconds. However, the main objective is to also handle the error if the CvP firmware is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit as recommended by the firmware engineers. The flow prescribed here is also an identical copy of working CvP driver which is also owned by Intel. This driver is a downstream driver which is not part of the Linux kernel. We are now porting this differences over to the current upstream CvP driver.
On Wed, May 18, 2022 at 03:38:44PM +0800, tien.sung.ang@intel.com wrote: > From: Dinh Nguyen <dinh.nguyen@intel.com> > > CFG_READY signal/bit may time-out due to firmware not responding > within the given time-out. This time varies due to numerous > factors like size of bitstream and others. > This time-out error does not impact the result of the CvP > previous transactions. The CvP driver shall then, respond with Do you mean the reprogramming is successful even if you find the time out in write_complete()? Then return 0 is better? And could you specify what the time-out mean on write_init() phase? Thanks, Yilun > EAGAIN instead Time out error. > > Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com> > Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com> > --- > drivers/fpga/altera-cvp.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c > index 4ffb9da537d8..d74ff63c61e8 100644 > --- a/drivers/fpga/altera-cvp.c > +++ b/drivers/fpga/altera-cvp.c > @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr, > /* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */ > ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0, > conf->priv->poll_time_us); > - if (ret) > + if (ret) { > dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n"); > + goto error_path; > + } > > return ret; > + > +error_path: > + /* reset CVP_MODE and HIP_CLK_SEL bit */ > + altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val); > + val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL; > + val &= ~VSE_CVP_MODE_CTRL_CVP_MODE; > + altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val); > + > + return -EAGAIN; > + > } > > static int altera_cvp_write_init(struct fpga_manager *mgr, > -- > 2.25.1
On Thu, May 19, 2022 at 05:39:07PM +0800, tien.sung.ang@intel.com wrote: > Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this > as an error irrespective of the value. The CvP driver is changed now to just > indicate the correct error which recommends a retry. To me understanding, > EAGAIN was this. The fpga manager now looks like is going to return a CvP > failure in short. > A BUSY state does not seem to be able to solve this issue. > Even an extended time-out didn't resolve this error state. The current time-out > is set to 10seconds. > However, the main objective is to also handle the error if the CvP firmware > is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit Please add your main objective to commit message. Thanks, Yilun > as recommended by the firmware engineers. > The flow prescribed here is also an identical copy of working CvP driver > which is also owned by Intel. This driver is a downstream driver which is > not part of the Linux kernel. We are now porting this differences over to > the current upstream CvP driver.
>> CFG_READY signal/bit may time-out due to firmware not responding >> within the given time-out. This time varies due to numerous >> factors like size of bitstream and others. >> This time-out error does not impact the result of the CvP >> previous transactions. The CvP driver shall then, respond with >Do you mean the reprogramming is successful even if you find the time >out in write_complete()? Then return 0 is better? Based on the information given by the Intel FPGA firmware team, CFG_READY is essential to indicate if the current FPGA configuration session is indeed a success. There are cases we test in the lab whereby, CFG_READY stays invalid and the tests performed subsequently to verify the FPGA functionality could not detect the failed session. A failed FPGA configuration session means, the new bitstream wasn't successfully configured and tests ran later will just be passing on the previous working bitstream version. In short, CFG_READY is esential, and an error indicating the time-out is a must. Another example, using an incorrect SOF/Design FPGA results in CFG_READY being invalid. The user must be informed of a potential error. I will correct the wordings i used earlier that says that the timoeut error does not impact the results of the CvP previous transactions. It may so if the firmware has some sort of error. >And could you specify what the time-out mean on write_init() phase? I could not really understand your question. We set huge time-outs of ~10seconds. Every wait for the firmware to respond is potentially a hazard. The firmware CvP is has it's limitation unfortunately.
On Tue, May 31, 2022 at 10:20:04AM +0800, tien.sung.ang@intel.com wrote: > >> CFG_READY signal/bit may time-out due to firmware not responding > >> within the given time-out. This time varies due to numerous > >> factors like size of bitstream and others. > >> This time-out error does not impact the result of the CvP > >> previous transactions. The CvP driver shall then, respond with > > >Do you mean the reprogramming is successful even if you find the time > >out in write_complete()? Then return 0 is better? > Based on the information given by the Intel FPGA firmware team, > CFG_READY is essential to indicate if the current FPGA > configuration session is indeed a success. There are > cases we test in the lab whereby, CFG_READY stays invalid and > the tests performed subsequently to verify the FPGA functionality > could not detect the failed session. A failed FPGA > configuration session means, the new bitstream wasn't > successfully configured and tests ran later will just be passing > on the previous working bitstream version. In short, CFG_READY > is esential, and an error indicating the time-out is a must. > Another example, using an incorrect SOF/Design FPGA results > in CFG_READY being invalid. The user must be informed of a > potential error. > I will correct the wordings i used earlier that says that > the timoeut error does not impact the results of the CvP > previous transactions. It may so if the firmware has some sort > of error. Understood. But with your new comment why you must change the error code to -EAGAIN rather than timeout? I think you may change your commit message. The main change is adding the error handling. The error code change is minor, even not necessary if you don't have a strong reason. Thanks, Yilun > > >And could you specify what the time-out mean on write_init() phase? > I could not really understand your question. We set huge > time-outs of ~10seconds. Every wait for the firmware to respond > is potentially a hazard. The firmware CvP is has it's limitation > unfortunately.
diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c index 4ffb9da537d8..d74ff63c61e8 100644 --- a/drivers/fpga/altera-cvp.c +++ b/drivers/fpga/altera-cvp.c @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr, /* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */ ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0, conf->priv->poll_time_us); - if (ret) + if (ret) { dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n"); + goto error_path; + } return ret; + +error_path: + /* reset CVP_MODE and HIP_CLK_SEL bit */ + altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val); + val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL; + val &= ~VSE_CVP_MODE_CTRL_CVP_MODE; + altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val); + + return -EAGAIN; + } static int altera_cvp_write_init(struct fpga_manager *mgr,