Message ID | 20240516140426.60439-1-wojciech.drewek@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net] ice: implement AQ download pkg retry | expand |
On Thu, May 16, 2024 at 04:04:26PM +0200, Wojciech Drewek wrote: > ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due > to FW issue. Fix this by retrying five times before moving to > Safe Mode. Hi Wojciech, As this is for iwl-net, and therefore net, please consider supplying a Fixes tag. Otherwise this looks good to me. > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> ...
On 5/16/2024 7:04 AM, Wojciech Drewek wrote: > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due > to FW issue. Fix this by retrying five times before moving to > Safe Mode. > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> > --- > drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c > index ce5034ed2b24..19e2111fcf08 100644 > --- a/drivers/net/ethernet/intel/ice/ice_ddp.c > +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c > @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, > > for (i = 0; i < count; i++) { > bool last = false; > + int try_cnt = 0; > int status; > > bh = (struct ice_buf_hdr *)(bufs + start + i); > @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, > if (indicate_last) > last = ice_is_last_download_buffer(bh, i, count); > > - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last, > - &offset, &info, NULL); > + while (try_cnt < 5) { > + status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, > + last, &offset, &info, > + NULL); > + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC && > + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) Are these the only 2 sporadic errors that FW will return? > + break; > + > + try_cnt++; > + msleep(20); > + } > + > + if (try_cnt) > + dev_dbg(ice_hw_to_dev(hw), > + "ice_aq_download_pkg failed, number of retries: %d\n", > + try_cnt); If try_cnt is non-zero it doesn't mean the last download failed, it just means one or more attempts to download failed right? Maybe just "ice_aq_download_pkg number of retries: %d" since the if (status) check below will print on failure? > > /* Save AQ status from download package */ > if (status) { > -- > 2.40.1 > >
> -----Original Message----- > From: Wojciech Drewek <wojciech.drewek@intel.com> > Sent: Thursday, May 16, 2024 7:34 PM > To: netdev@vger.kernel.org > Cc: intel-wired-lan@lists.osuosl.org > Subject: [PATCH iwl-net] ice: implement AQ download pkg retry > > ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due to FW > issue. Fix this by retrying five times before moving to Safe Mode. > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> > --- > drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c > b/drivers/net/ethernet/intel/ice/ice_ddp.c > index ce5034ed2b24..19e2111fcf08 100644 > --- a/drivers/net/ethernet/intel/ice/ice_ddp.c > +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c > @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct > ice_buf *bufs, u32 start, > > for (i = 0; i < count; i++) { > bool last = false; > + int try_cnt = 0; > int status; > > bh = (struct ice_buf_hdr *)(bufs + start + i); @@ -1346,8 > +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf > *bufs, u32 start, > if (indicate_last) > last = ice_is_last_download_buffer(bh, i, count); > > - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, > last, > - &offset, &info, NULL); > + while (try_cnt < 5) { > + status = ice_aq_download_pkg(hw, bh, > ICE_PKG_BUF_SIZE, > + last, &offset, &info, > + NULL); > + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC > && > + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) > + break; > + > + try_cnt++; > + msleep(20); > + } > + > + if (try_cnt) > + dev_dbg(ice_hw_to_dev(hw), > + "ice_aq_download_pkg failed, number of retries: > %d\n", > + try_cnt); Do you really need this dbg statement when try_cnt < 5? Is it not misleading in success case (with retries)? Thanks, Naveen > > /* Save AQ status from download package */ > if (status) { > -- > 2.40.1 >
On 16.05.2024 16:56, Simon Horman wrote: > On Thu, May 16, 2024 at 04:04:26PM +0200, Wojciech Drewek wrote: >> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due >> to FW issue. Fix this by retrying five times before moving to >> Safe Mode. > > Hi Wojciech, > > As this is for iwl-net, and therefore net, please consider supplying a > Fixes tag. Sure thing. > > Otherwise this looks good to me. > >> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> > > ...
On 16.05.2024 18:36, Brett Creeley wrote: > > > On 5/16/2024 7:04 AM, Wojciech Drewek wrote: >> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. >> >> >> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due >> to FW issue. Fix this by retrying five times before moving to >> Safe Mode. >> >> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++-- >> 1 file changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c >> index ce5034ed2b24..19e2111fcf08 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c >> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c >> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, >> >> for (i = 0; i < count; i++) { >> bool last = false; >> + int try_cnt = 0; >> int status; >> >> bh = (struct ice_buf_hdr *)(bufs + start + i); >> @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, >> if (indicate_last) >> last = ice_is_last_download_buffer(bh, i, count); >> >> - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last, >> - &offset, &info, NULL); >> + while (try_cnt < 5) { >> + status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, >> + last, &offset, &info, >> + NULL); >> + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC && >> + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) > > Are these the only 2 sporadic errors that FW will return? Yes, that's right. We don't want to retry in case of other errors since those might be valid. > >> + break; >> + >> + try_cnt++; >> + msleep(20); >> + } >> + >> + if (try_cnt) >> + dev_dbg(ice_hw_to_dev(hw), >> + "ice_aq_download_pkg failed, number of retries: %d\n", >> + try_cnt); > > If try_cnt is non-zero it doesn't mean the last download failed, it just means one or more attempts to download failed right? Maybe just "ice_aq_download_pkg number of retries: %d" since the if (status) check below will print on failure? Sounds reasonable, we want this log only because we want to know if we hit this sporadic failure. > >> >> /* Save AQ status from download package */ >> if (status) { >> -- >> 2.40.1 >> >>
On 17.05.2024 09:26, Naveen Mamindlapalli wrote: > >> -----Original Message----- >> From: Wojciech Drewek <wojciech.drewek@intel.com> >> Sent: Thursday, May 16, 2024 7:34 PM >> To: netdev@vger.kernel.org >> Cc: intel-wired-lan@lists.osuosl.org >> Subject: [PATCH iwl-net] ice: implement AQ download pkg retry >> >> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due to FW >> issue. Fix this by retrying five times before moving to Safe Mode. >> >> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++-- >> 1 file changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c >> b/drivers/net/ethernet/intel/ice/ice_ddp.c >> index ce5034ed2b24..19e2111fcf08 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c >> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c >> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct >> ice_buf *bufs, u32 start, >> >> for (i = 0; i < count; i++) { >> bool last = false; >> + int try_cnt = 0; >> int status; >> >> bh = (struct ice_buf_hdr *)(bufs + start + i); @@ -1346,8 >> +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf >> *bufs, u32 start, >> if (indicate_last) >> last = ice_is_last_download_buffer(bh, i, count); >> >> - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, >> last, >> - &offset, &info, NULL); >> + while (try_cnt < 5) { >> + status = ice_aq_download_pkg(hw, bh, >> ICE_PKG_BUF_SIZE, >> + last, &offset, &info, >> + NULL); >> + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC >> && >> + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) >> + break; >> + >> + try_cnt++; >> + msleep(20); >> + } >> + >> + if (try_cnt) >> + dev_dbg(ice_hw_to_dev(hw), >> + "ice_aq_download_pkg failed, number of retries: >> %d\n", >> + try_cnt); > > Do you really need this dbg statement when try_cnt < 5? Is it not misleading in success case (with retries)? Agree. I'm going to change it to "ice_aq_download_pkg number of retries: %d" as Brett suggested. > > Thanks, > Naveen > >> >> /* Save AQ status from download package */ >> if (status) { >> -- >> 2.40.1 >> >
On 5/17/24 09:49, Wojciech Drewek wrote: > > > On 16.05.2024 18:36, Brett Creeley wrote: >> >> >> On 5/16/2024 7:04 AM, Wojciech Drewek wrote: >>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. >>> >>> >>> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due >>> to FW issue. Fix this by retrying five times before moving to >>> Safe Mode. >>> >>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> >>> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> >>> --- >>> drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++-- >>> 1 file changed, 17 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c >>> index ce5034ed2b24..19e2111fcf08 100644 >>> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c >>> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c >>> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, >>> >>> for (i = 0; i < count; i++) { >>> bool last = false; >>> + int try_cnt = 0; >>> int status; >>> >>> bh = (struct ice_buf_hdr *)(bufs + start + i); >>> @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, >>> if (indicate_last) >>> last = ice_is_last_download_buffer(bh, i, count); >>> >>> - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last, >>> - &offset, &info, NULL); >>> + while (try_cnt < 5) { >>> + status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, >>> + last, &offset, &info, >>> + NULL); >>> + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC && >>> + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) >> >> Are these the only 2 sporadic errors that FW will return? > > Yes, that's right. We don't want to retry in case of other errors since those might be valid. I would say that those are the only two non-sporadic errors ;) > >> >>> + break; >>> + >>> + try_cnt++; >>> + msleep(20); >>> + } >>> + >>> + if (try_cnt) >>> + dev_dbg(ice_hw_to_dev(hw), >>> + "ice_aq_download_pkg failed, number of retries: %d\n", s/retries/attempts/ (as retries = attempts + 1 ;)) >>> + try_cnt); >> >> If try_cnt is non-zero it doesn't mean the last download failed, it just means one or more attempts to download failed right? Maybe just "ice_aq_download_pkg number of retries: %d" since the if (status) check below will print on failure? > > Sounds reasonable, we want this log only because we want to know if we hit this sporadic failure. > >> >>> >>> /* Save AQ status from download package */ >>> if (status) { >>> -- >>> 2.40.1 >>> >>>
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c index ce5034ed2b24..19e2111fcf08 100644 --- a/drivers/net/ethernet/intel/ice/ice_ddp.c +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, for (i = 0; i < count; i++) { bool last = false; + int try_cnt = 0; int status; bh = (struct ice_buf_hdr *)(bufs + start + i); @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start, if (indicate_last) last = ice_is_last_download_buffer(bh, i, count); - status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last, - &offset, &info, NULL); + while (try_cnt < 5) { + status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, + last, &offset, &info, + NULL); + if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC && + hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG) + break; + + try_cnt++; + msleep(20); + } + + if (try_cnt) + dev_dbg(ice_hw_to_dev(hw), + "ice_aq_download_pkg failed, number of retries: %d\n", + try_cnt); /* Save AQ status from download package */ if (status) {