diff mbox series

[iwl-net,v3,2/6] igc: Lengthen the hardware retry time to prevent timeouts

Message ID 20241106184722.17230-3-christopher.s.hall@intel.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series igc: Fix PTM timeout | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 3 blamed authors not CCed: sasha.neftin@intel.com anthony.l.nguyen@intel.com muhammad.husaini.zulkifli@intel.com; 8 maintainers not CCed: andrew+netdev@lunn.ch anthony.l.nguyen@intel.com pabeni@redhat.com edumazet@google.com przemyslaw.kitszel@intel.com muhammad.husaini.zulkifli@intel.com sasha.neftin@intel.com kuba@kernel.org
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hall, Christopher S Nov. 6, 2024, 6:47 p.m. UTC
Lengthen the hardware retry timer to four microseconds.

The i225/i226 hardware retries if it receives an inappropriate response
from the upstream device. If the device retries too quickly, the root
port does not respond.

The issue can be reproduced with the following:

$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m

Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.

PHC2SYS exits with:

"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
  fails

Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paul Menzel Nov. 6, 2024, 11:13 p.m. UTC | #1
Dear Christopher,


Thank you for the patch.

I’d use the more specific summary/title below:

igc: Lengthen hardware retry time to 4 μs to prevent timeouts

Am 06.11.24 um 19:47 schrieb Christopher S M Hall:
> Lengthen the hardware retry timer to four microseconds.
> 
> The i225/i226 hardware retries if it receives an inappropriate response
> from the upstream device. If the device retries too quickly, the root
> port does not respond.

Any idea why? Is it documented somewhere?

> The issue can be reproduced with the following:
> 
> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
> 
> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
> quickly reproduce the issue.
> 
> PHC2SYS exits with:
> 
> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
>    fails

Why four microseconds, and not some other value?

> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
> Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
> Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
> ---
>   drivers/net/ethernet/intel/igc/igc_defines.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index 2ff292f5f63b..84521a4c35b4 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -574,7 +574,7 @@
>   #define IGC_PTM_CTRL_SHRT_CYC(usec)	(((usec) & 0x3f) << 2)
>   #define IGC_PTM_CTRL_PTM_TO(usec)	(((usec) & 0xff) << 8)
>   
> -#define IGC_PTM_SHORT_CYC_DEFAULT	1   /* Default short cycle interval */
> +#define IGC_PTM_SHORT_CYC_DEFAULT	4   /* Default short cycle interval */
>   #define IGC_PTM_CYC_TIME_DEFAULT	5   /* Default PTM cycle time */
>   #define IGC_PTM_TIMEOUT_DEFAULT		255 /* Default timeout for PTM errors */


Kind regards,

Paul
Hall, Christopher S Nov. 6, 2024, 11:53 p.m. UTC | #2
Hi Paul,

> From: Paul Menzel <pmenzel@molgen.mpg.de>
> Sent: Wednesday, November 06, 2024 3:14 PM


> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the
> hardware retry time to prevent timeouts
> 
> Dear Christopher,
> 
> 
> Thank you for the patch.
> 
> I’d use the more specific summary/title below:

Will do.

> 
> igc: Lengthen hardware retry time to 4 μs to prevent timeouts
> 
> Am 06.11.24 um 19:47 schrieb Christopher S M Hall:
> > Lengthen the hardware retry timer to four microseconds.
> >
> > The i225/i226 hardware retries if it receives an inappropriate response
> > from the upstream device. If the device retries too quickly, the root
> > port does not respond.
> 
> Any idea why? Is it documented somewhere?

I do not. Theoretically, 1 us should work, but it does not. It could be a root
port problem or an issue with i225/i226 NIC. I am not able to directly observe
the state of either. 4 us has worked in all my testing I am comfortable with
that value. 2 us also works, but given the limited hardware at my disposal
I doubled the value to 4 us to be safe. PTM is not time critical. Typically,
software initiates a transaction between 8 and 32 times per second. There
is no performance impact for PTM or any other function of the card. The
timeout occurs rarely, but if the retry time is too short the PTM state
machine does not recover.

> > The issue can be reproduced with the following:
> >
> > $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
> >
> > Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
> > quickly reproduce the issue.
> >
> > PHC2SYS exits with:
> >
> > "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM
> transaction
> >    fails
> 
> Why four microseconds, and not some other value?

See above.

> > Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
> >
> > -#define IGC_PTM_SHORT_CYC_DEFAULT	1   /* Default short cycle
> interval */
> > +#define IGC_PTM_SHORT_CYC_DEFAULT	4   /* Default short cycle
> interval */

> Kind regards,
> 
> Paul

Thanks,
Chris
Paul Menzel Nov. 7, 2024, 5:48 a.m. UTC | #3
[Cc: +Sasha]

Dear Christopher,


Am 07.11.24 um 00:53 schrieb Hall, Christopher S:

>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>> Sent: Wednesday, November 06, 2024 3:14 PM
> 
>> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the
>> hardware retry time to prevent timeouts

>> I’d use the more specific summary/title below:
> 
> Will do.
> 
>> igc: Lengthen hardware retry time to 4 μs to prevent timeouts
>>
>> Am 06.11.24 um 19:47 schrieb Christopher S M Hall:
>>> Lengthen the hardware retry timer to four microseconds.
>>>
>>> The i225/i226 hardware retries if it receives an inappropriate response
>>> from the upstream device. If the device retries too quickly, the root
>>> port does not respond.
>>
>> Any idea why? Is it documented somewhere?
> 
> I do not. Theoretically, 1 us should work, but it does not. It could be a root
> port problem or an issue with i225/i226 NIC. I am not able to directly observe
> the state of either. 4 us has worked in all my testing I am comfortable with
> that value. 2 us also works, but given the limited hardware at my disposal
> I doubled the value to 4 us to be safe. PTM is not time critical. Typically,
> software initiates a transaction between 8 and 32 times per second. There
> is no performance impact for PTM or any other function of the card. The
> timeout occurs rarely, but if the retry time is too short the PTM state
> machine does not recover.

Thank you for clearing this up. If it’s not time critical, why not 
revert the original patch and go back to 10 μs.

The referenced commit 6b8aa753a9f9 (igc: Decrease PTM short interval 
from 10 us to 1 us) also says, that 1 μs was suggested by the hardware 
team. Were you able to talk to them?

>>> The issue can be reproduced with the following:
>>>
>>> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
>>>
>>> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
>>> quickly reproduce the issue.
>>>
>>> PHC2SYS exits with:
>>>
>>> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
>>>     fails
>>
>> Why four microseconds, and not some other value?
> 
> See above.

It’d be great, if you extended the commit message.

>>> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
>>>
>>> -#define IGC_PTM_SHORT_CYC_DEFAULT	1   /* Default short cycle interval */
>>> +#define IGC_PTM_SHORT_CYC_DEFAULT	4   /* Default short cycle interval */

Maybe also add a comment, that 1 μs should work, but does not.


Kind regards,

Paul
Paul Menzel Nov. 7, 2024, 5:56 a.m. UTC | #4
[Cc: -Sasha, 550 #5.1.0 Address rejected.]

Am 07.11.24 um 06:48 schrieb Paul Menzel:
> [Cc: +Sasha]
> 
> Dear Christopher,
> 
> 
> Am 07.11.24 um 00:53 schrieb Hall, Christopher S:
> 
>>> From: Paul Menzel <pmenzel@molgen.mpg.de>
>>> Sent: Wednesday, November 06, 2024 3:14 PM
>>
>>> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the hardware retry time to prevent timeouts
> 
>>> I’d use the more specific summary/title below:
>>
>> Will do.
>>
>>> igc: Lengthen hardware retry time to 4 μs to prevent timeouts
>>>
>>> Am 06.11.24 um 19:47 schrieb Christopher S M Hall:
>>>> Lengthen the hardware retry timer to four microseconds.
>>>>
>>>> The i225/i226 hardware retries if it receives an inappropriate response
>>>> from the upstream device. If the device retries too quickly, the root
>>>> port does not respond.
>>>
>>> Any idea why? Is it documented somewhere?
>>
>> I do not. Theoretically, 1 us should work, but it does not. It could be a root
>> port problem or an issue with i225/i226 NIC. I am not able to directly observe
>> the state of either. 4 us has worked in all my testing I am comfortable with
>> that value. 2 us also works, but given the limited hardware at my disposal
>> I doubled the value to 4 us to be safe. PTM is not time critical. Typically,
>> software initiates a transaction between 8 and 32 times per second. There
>> is no performance impact for PTM or any other function of the card. The
>> timeout occurs rarely, but if the retry time is too short the PTM state
>> machine does not recover.
> 
> Thank you for clearing this up. If it’s not time critical, why not 
> revert the original patch and go back to 10 μs.
> 
> The referenced commit 6b8aa753a9f9 (igc: Decrease PTM short interval 
> from 10 us to 1 us) also says, that 1 μs was suggested by the hardware 
> team. Were you able to talk to them?
> 
>>>> The issue can be reproduced with the following:
>>>>
>>>> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
>>>>
>>>> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
>>>> quickly reproduce the issue.
>>>>
>>>> PHC2SYS exits with:
>>>>
>>>> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
>>>>     fails
>>>
>>> Why four microseconds, and not some other value?
>>
>> See above.
> 
> It’d be great, if you extended the commit message.
> 
>>>> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 
>>>> 1 us")
>>>>
>>>> -#define IGC_PTM_SHORT_CYC_DEFAULT    1   /* Default short cycle interval */
>>>> +#define IGC_PTM_SHORT_CYC_DEFAULT    4   /* Default short cycle interval */
> 
> Maybe also add a comment, that 1 μs should work, but does not.
> 
> 
> Kind regards,
> 
> Paul
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 2ff292f5f63b..84521a4c35b4 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -574,7 +574,7 @@ 
 #define IGC_PTM_CTRL_SHRT_CYC(usec)	(((usec) & 0x3f) << 2)
 #define IGC_PTM_CTRL_PTM_TO(usec)	(((usec) & 0xff) << 8)
 
-#define IGC_PTM_SHORT_CYC_DEFAULT	1   /* Default short cycle interval */
+#define IGC_PTM_SHORT_CYC_DEFAULT	4   /* Default short cycle interval */
 #define IGC_PTM_CYC_TIME_DEFAULT	5   /* Default PTM cycle time */
 #define IGC_PTM_TIMEOUT_DEFAULT		255 /* Default timeout for PTM errors */