Message ID | 20220128104938.2211441-2-tobias@waldekranz.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dsa: mv88e6xxx: Improve indirect addressing performance | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 50 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Fri, Jan 28, 2022 at 11:49:37AM +0100, Tobias Waldekranz wrote: > Avoid a long delay when a busy bit is still set and has to be polled > again. > > Measurements on a system with 2 Opals (6097F) and one Agate (6352) > show that even with this much tighter loop, we have about a 50% chance > of the bit being cleared on the first poll, all other accesses see the > bit being cleared on the second poll. > > On a standard MDIO bus running MDC at 2.5MHz, a single access with 32 > bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us. > > This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in > the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow > case - bringing the average close to 1ms. > > With this change in place, the slow case is closer to 2*26us + CPU > overhead, with the average well below 100us - a 10x improvement. > > This translates to real-world winnings. On a 3-chip 20-port system, > the modprobe time drops by 88%: > > Before: > > root@coronet:~# time modprobe mv88e6xxx > real 0m 15.99s > user 0m 0.00s > sys 0m 1.52s > > After: > > root@coronet:~# time modprobe mv88e6xxx > real 0m 2.21s > user 0m 0.00s > sys 0m 1.54s > > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 58ca684d73f7..de8a568a8c53 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -86,12 +86,16 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val) int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask, u16 val) { + const unsigned long timeout = jiffies + msecs_to_jiffies(50); u16 data; int err; int i; - /* There's no bus specific operation to wait for a mask */ - for (i = 0; i < 16; i++) { + /* There's no bus specific operation to wait for a mask. Even + * if the initial poll takes longer than 50ms, always do at + * least one more attempt. + */ + for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) { err = mv88e6xxx_read(chip, addr, reg, &data); if (err) return err; @@ -99,7 +103,7 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg, if ((data & mask) == val) return 0; - usleep_range(1000, 2000); + cpu_relax(); } dev_err(chip->dev, "Timeout while waiting for switch\n"); diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c index 282fe08db050..466d2aaa9fcb 100644 --- a/drivers/net/dsa/mv88e6xxx/smi.c +++ b/drivers/net/dsa/mv88e6xxx/smi.c @@ -55,11 +55,15 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip, static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip, int dev, int reg, int bit, int val) { + const unsigned long timeout = jiffies + msecs_to_jiffies(50); u16 data; int err; int i; - for (i = 0; i < 16; i++) { + /* Even if the initial poll takes longer than 50ms, always do + * at least one more attempt. + */ + for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) { err = mv88e6xxx_smi_direct_read(chip, dev, reg, &data); if (err) return err; @@ -67,7 +71,7 @@ static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip, if (!!(data & BIT(bit)) == !!val) return 0; - usleep_range(1000, 2000); + cpu_relax(); } return -ETIMEDOUT;
Avoid a long delay when a busy bit is still set and has to be polled again. Measurements on a system with 2 Opals (6097F) and one Agate (6352) show that even with this much tighter loop, we have about a 50% chance of the bit being cleared on the first poll, all other accesses see the bit being cleared on the second poll. On a standard MDIO bus running MDC at 2.5MHz, a single access with 32 bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us. This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow case - bringing the average close to 1ms. With this change in place, the slow case is closer to 2*26us + CPU overhead, with the average well below 100us - a 10x improvement. This translates to real-world winnings. On a 3-chip 20-port system, the modprobe time drops by 88%: Before: root@coronet:~# time modprobe mv88e6xxx real 0m 15.99s user 0m 0.00s sys 0m 1.52s After: root@coronet:~# time modprobe mv88e6xxx real 0m 2.21s user 0m 0.00s sys 0m 1.54s Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> --- drivers/net/dsa/mv88e6xxx/chip.c | 10 +++++++--- drivers/net/dsa/mv88e6xxx/smi.c | 8 ++++++-- 2 files changed, 13 insertions(+), 5 deletions(-)