Message ID | 20220919110847.744712-6-mattias.forsblad@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dsa: qca8k, mv88e6xxx: rmon: Add RMU support | expand |
On Mon, Sep 19, 2022 at 01:08:45PM +0200, Mattias Forsblad wrote: > @@ -255,6 +299,15 @@ int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip) > chip->rmu.smi_ops = chip->smi_ops; > chip->rmu.ds_ops = chip->ds->ops; > > + /* Change rmu ops with our own pointer */ > + chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops; > + chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get; The patch splitting is still so poor, that I can't even complain about this bug properly, since no one uses this pointer for now (and when it will be used, it will not be obvious how it's assigned). In short, it's called like this: static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data) { struct mv88e6xxx_chip *chip = ds->priv; chip->smi_ops->get_rmon(chip, port, data); } When the switch doesn't support RMU operations, or when the RMU setup simply failed, "ethtool -S" will dereference a very NULL pointer during that indirect call, because mv88e6xxx_rmu_setup() is unconditionally called for every switch during setup. Probably not a wise choice to start off with the RMU ops for ethtool -S. Also, not clear, if RMU fails, why we don't make an effort to fall back to MDIO for that user space request. > + > + /* Also change get stats strings for our own */ Who is "our own", and implicitly, who are they? You'd expect that there aren't tribalist factions within the same driver. > + chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops; > + chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu; > + chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu; > + Those who cast a const to a non-const pointer and proceed to modify the read-only structure behind it should go to patchwork arrest for one week. > /* Start disabled, we'll enable RMU in master_state_change */ > if (chip->info->ops->rmu_disable) > return chip->info->ops->rmu_disable(chip); > -- > 2.25.1 >
On 2022-09-20 00:49, Vladimir Oltean wrote: > On Mon, Sep 19, 2022 at 01:08:45PM +0200, Mattias Forsblad wrote: >> @@ -255,6 +299,15 @@ int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip) >> chip->rmu.smi_ops = chip->smi_ops; >> chip->rmu.ds_ops = chip->ds->ops; >> >> + /* Change rmu ops with our own pointer */ >> + chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops; >> + chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get; > > The patch splitting is still so poor, that I can't even complain about > this bug properly, since no one uses this pointer for now (and when it > will be used, it will not be obvious how it's assigned). > > In short, it's called like this: > > static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port, > uint64_t *data) > { > struct mv88e6xxx_chip *chip = ds->priv; > > chip->smi_ops->get_rmon(chip, port, data); > } > > When the switch doesn't support RMU operations, or when the RMU setup > simply failed, "ethtool -S" will dereference a very NULL pointer during > that indirect call, because mv88e6xxx_rmu_setup() is unconditionally > called for every switch during setup. Probably not a wise choice to > start off with the RMU ops for ethtool -S. > > Also, not clear, if RMU fails, why we don't make an effort to fall back > to MDIO for that user space request. > >> + >> + /* Also change get stats strings for our own */ > > Who is "our own", and implicitly, who are they? You'd expect that there > aren't tribalist factions within the same driver. > >> + chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops; >> + chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu; >> + chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu; >> + > > Those who cast a const to a non-const pointer and proceed to modify the > read-only structure behind it should go to patchwork arrest for one week. > >> /* Start disabled, we'll enable RMU in master_state_change */ >> if (chip->info->ops->rmu_disable) >> return chip->info->ops->rmu_disable(chip); >> -- >> 2.25.1 >> > This whole shebang was a suggestion from Andrew. I had a solution with mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon member? I'm not really sure on how to solve this in a better way? Suggestions any? Maybe I've misunderstood his suggestion. /Mattias
On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: > This whole shebang was a suggestion from Andrew. I had a solution with > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon > member? I'm not really sure on how to solve this in a better way? > Suggestions any? Maybe I've misunderstood his suggestion. Can you point me to the beginning of that exact suggestion? I've removed everything older than v10 from my inbox, since the flow of patches was preventing me from seeing other emails.
On 2022-09-20 15:10, Vladimir Oltean wrote: > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: >> This whole shebang was a suggestion from Andrew. I had a solution with >> mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. >> The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon >> member? I'm not really sure on how to solve this in a better way? >> Suggestions any? Maybe I've misunderstood his suggestion. > > Can you point me to the beginning of that exact suggestion? I've removed > everything older than v10 from my inbox, since the flow of patches was > preventing me from seeing other emails. https://lore.kernel.org/netdev/Yxkc6Zav7XoKRLBt@lunn.ch/ /Mattias
On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote: > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: > > This whole shebang was a suggestion from Andrew. I had a solution with > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon > > member? I'm not really sure on how to solve this in a better way? > > Suggestions any? Maybe I've misunderstood his suggestion. > > Can you point me to the beginning of that exact suggestion? I've removed > everything older than v10 from my inbox, since the flow of patches was > preventing me from seeing other emails. What i want to do is avoid code like: if (have_rmu()) foo() else bar() There is nothing common in the MDIO MIB code and the RMU MIB code, just the table of statistics. When we get to dumping the ATU, i also expect there will be little in common between the MDIO and the RMU functions. Doing MIB via RMU is a big gain, but i would also like normal register read and write to go via RMU, probably with some level of combining. Multiple writes can be combined into one RMU operation ending with a read. That should give us an mv88e6xxx_bus_ops which does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU bus_ops. But how do we mix RMU MIB and ATU dumps into this? My idea was to make them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops structures would end up call the mv88e6xxx_ops method for MIB or ATU. The rmu bus_ops and directly call an RMU function to do it. What is messy at the moment is that we don't have register read/write via RMU, so we have some horrible hybrid. We should probably just implement simple read and write, without combining, so we can skip this hybrid. I am assuming here that RMU is reliable. The QCA8K driver currently falls back to MDIO if its inband function is attempted but fails. I want to stress this part, lots of data packets and see if the RMU frames get dropped, or delayed too much causing failures. If we do see failures, is a couple of retires enough? Or do we need to fallback to MDIO which should always work? If we do need to fallback, this structure is not going to work too well. Andrew
On 2022-09-20 23:04, Andrew Lunn wrote: > On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote: >> On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: >>> This whole shebang was a suggestion from Andrew. I had a solution with >>> mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. >>> The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon >>> member? I'm not really sure on how to solve this in a better way? >>> Suggestions any? Maybe I've misunderstood his suggestion. >> >> Can you point me to the beginning of that exact suggestion? I've removed >> everything older than v10 from my inbox, since the flow of patches was >> preventing me from seeing other emails. > > What i want to do is avoid code like: > > if (have_rmu()) > foo() > else > bar() > > There is nothing common in the MDIO MIB code and the RMU MIB code, > just the table of statistics. When we get to dumping the ATU, i also > expect there will be little in common between the MDIO and the RMU > functions. > > Doing MIB via RMU is a big gain, but i would also like normal register > read and write to go via RMU, probably with some level of > combining. Multiple writes can be combined into one RMU operation > ending with a read. That should give us an mv88e6xxx_bus_ops which > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU > bus_ops. > > But how do we mix RMU MIB and ATU dumps into this? My idea was to make > them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops > structures would end up call the mv88e6xxx_ops method for MIB or > ATU. The rmu bus_ops and directly call an RMU function to do it. > > What is messy at the moment is that we don't have register read/write > via RMU, so we have some horrible hybrid. We should probably just > implement simple read and write, without combining, so we can skip > this hybrid. > > I am assuming here that RMU is reliable. The QCA8K driver currently > falls back to MDIO if its inband function is attempted but fails. I > want to stress this part, lots of data packets and see if the RMU > frames get dropped, or delayed too much causing failures. If we do see > failures, is a couple of retires enough? Or do we need to fallback to > MDIO which should always work? If we do need to fallback, this > structure is not going to work too well. > > Andrew I understand want you want but I can see a lot of risks and pitfalls with moving ordinary read and writes to RMU, which I wanted to avoid by first doing RMON dump and then dump ATU and at a later stage with a better architecture for write/read combining doing that, instead of forcing through read/writes with all associated testing it would require. Can we please do this in steps? /Mattias
> I understand want you want but I can see a lot of risks and pitfalls with moving > ordinary read and writes to RMU, which I wanted to avoid by first doing > RMON dump and then dump ATU and at a later stage with a better architecture > for write/read combining doing that, instead of forcing through read/writes > with all associated testing it would require. Can we please do this in > steps? If we are going to fall back to MDIO when RMU fails, we need a different code structure for these operations. That different code structure should also help solve the messy _ops structure stuff. RMU affects us in two different locations: ATU and MIB dump: Controlled by struct mv88e6xxx_ops register read/write: Controlled by struct mv88e6xxx_bus_ops We could add to struct mv88e6xxx_ops: int (*stats_rmu_get_sset_count)(struct mv88e6xxx_chip *chip); int (*stats_rmu_get_strings)(struct mv88e6xxx_chip *chip, uint8_t *data); int (*stats_rmu_get_stats)(struct mv88e6xxx_chip *chip, int port, uint64_t *data); and then mv88e6xxx_get_stats() would become something like: static void mv88e6xxx_get_stats(struct mv88e6xxx_chip *chip, int port, uint64_t *data) { int count = 0; int err; if (chip->info->ops->stats_rmu_get_stats && mv88e6xxx_rmu_enabled(chip)) { err = chip->info->ops->stats_rmu_get_stats(chip, port, data) if (!err) return; } if (chip->info->ops->stats_get_stats) count = chip->info->ops->stats_get_stats(chip, port, data); We then get fall back to MDIO, and clean, separate implementations of RMU and MDIO stats operations. I hope the ATU/fdb dump can be done in a similar way. register read/writes, we probably need to extend mv88e6xxx_smi_read() and mv88e6xxx_smi_write(). Try RMU first, and then fall back to MDIO. Please try something in this direction. But please, lots of small, simple patches with good commit messages. Andrew
On Tue, Sep 20, 2022 at 11:04:07PM +0200, Andrew Lunn wrote: > On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote: > > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: > > > This whole shebang was a suggestion from Andrew. I had a solution with > > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. > > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon > > > member? I'm not really sure on how to solve this in a better way? > > > Suggestions any? Maybe I've misunderstood his suggestion. > > > > Can you point me to the beginning of that exact suggestion? I've removed > > everything older than v10 from my inbox, since the flow of patches was > > preventing me from seeing other emails. > > What i want to do is avoid code like: > > if (have_rmu()) > foo() > else > bar() > > There is nothing common in the MDIO MIB code and the RMU MIB code, > just the table of statistics. When we get to dumping the ATU, i also > expect there will be little in common between the MDIO and the RMU > functions. Sorry, I don't understand what it is about "if (have_rmu()) foo() else bar()" that you don't like. Isn't the indirection via bus_ops the exact same thing, but expressed as an indirect function call rather than an if/else? Or is it that the if/else structure precludes the calling of bar() when foo() fails? The bus_ops will suffer from the same problem. > Doing MIB via RMU is a big gain, but i would also like normal register > read and write to go via RMU, probably with some level of > combining. Multiple writes can be combined into one RMU operation > ending with a read. That should give us an mv88e6xxx_bus_ops which > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU > bus_ops. At what level would the combining be done? I think the mv88e6xxx doesn't really make use of bulk operations, C45 MDIO reads with post-increment, that sort of thing. I could be wrong. And at some higher level, the register read/write code should not diverge (too much), even if the operation may be done over Ethernet or MDIO. So we need to find places which actually make useful sense of bulk reads. > But how do we mix RMU MIB and ATU dumps into this? My idea was to make > them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops > structures would end up call the mv88e6xxx_ops method for MIB or > ATU. The rmu bus_ops and directly call an RMU function to do it. > > What is messy at the moment is that we don't have register read/write > via RMU, so we have some horrible hybrid. We should probably just > implement simple read and write, without combining, so we can skip > this hybrid. I see in the manual that there are some registers which aren't available or not recommended over RMU, for example the PHY registers if the PPU is disabled, or phylink methods for the upstream facing ports. There are also less obvious things, like accessing the PTP clock gettime(). This will surely change the delay characteristic that phc2sys sees, I'm not sure if for the better or for the worse, but the SMI code at least had some tuning made to it, so people might care. So bottom line, I'm not sure whether we do enough to prevent these pitfalls by simply creating blanket replacements for all register reads/writes and not inspecting whether the code is fine after we do that. On the other hand, having RMU operations might also bring subtle benefits to phc2sys. I think there were some issues with the PTP code having trouble getting MDIO bus access (because of bridge fdb dumps or some other things happening in the background). This may also be an opportunity to have parallel access to independent IP blocks within the switch, one goes through MDIO and the other through Ethernet. But then, Mattias' code structure becomes inadequate. Currently we serialize mv88e6xxx_master_state_change() with respect to bus accesses via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with MDIO, we need a rwlock, such that multiple 'readers' of the conceptual have_rmu() function can run in parallel with each other, and just serialize with the RMU state changes (the 'writers'). > I am assuming here that RMU is reliable. The QCA8K driver currently > falls back to MDIO if its inband function is attempted but fails. I > want to stress this part, lots of data packets and see if the RMU > frames get dropped, or delayed too much causing failures. I don't think you even have to stress it too much. Nothing prevents the user from putting a policer on the DSA master which will randomly drop responses. Or a shaper that will delay requests beyond the timeout. > If we do see > failures, is a couple of retires enough? Or do we need to fallback to > MDIO which should always work? If we do need to fallback, this > structure is not going to work too well. Consider our previous discussion about the switchdev prepare/commit transactional structure, where the commit stage is not supposed to fail even if it writes to hardware. You said that the writes are supposed to be reliable, or else. Either they all work or none do. Not sure how that is going to hold up with a transport such as Ethernet which has such a wide arsenal of foot guns. I think that leaving the MDIO fallback in is inevitable. In terms of retries, I'm not sure. With the qca8k code structure: if (have_rmu()) { ret = foo(); if (ret == 0) return 0; } return bar(); we won't have retries for the _current_ operation, but all further operations will still try to use Ethernet first. So here, it seems to me that the timeout needs to be tuned such that everything does not grind down to a halt even if we have a lossy Ethernet channel. Otherwise, we need to build some more (perhaps too advanced) logic. Have "if (have_rmu() && rmu_works())", periodically re-check if RMU works after a failure, etc.
On Thu, Sep 22, 2022 at 02:48:20PM +0300, Vladimir Oltean wrote: > On Tue, Sep 20, 2022 at 11:04:07PM +0200, Andrew Lunn wrote: > > On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote: > > > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote: > > > > This whole shebang was a suggestion from Andrew. I had a solution with > > > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of. > > > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon > > > > member? I'm not really sure on how to solve this in a better way? > > > > Suggestions any? Maybe I've misunderstood his suggestion. > > > > > > Can you point me to the beginning of that exact suggestion? I've removed > > > everything older than v10 from my inbox, since the flow of patches was > > > preventing me from seeing other emails. > > > > What i want to do is avoid code like: > > > > if (have_rmu()) > > foo() > > else > > bar() > > > > There is nothing common in the MDIO MIB code and the RMU MIB code, > > just the table of statistics. When we get to dumping the ATU, i also > > expect there will be little in common between the MDIO and the RMU > > functions. > > Sorry, I don't understand what it is about "if (have_rmu()) foo() else bar()" > that you don't like. Isn't the indirection via bus_ops the exact same > thing, but expressed as an indirect function call rather than an if/else? There was code like this deep inside the MIB code, which just looked ugly. That is now gone. > > Doing MIB via RMU is a big gain, but i would also like normal register > > read and write to go via RMU, probably with some level of > > combining. Multiple writes can be combined into one RMU operation > > ending with a read. That should give us an mv88e6xxx_bus_ops which > > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU > > bus_ops. > > At what level would the combining be done? I think the mv88e6xxx doesn't > really make use of bulk operations, C45 MDIO reads with post-increment, > that sort of thing. I could be wrong. And at some higher level, the > register read/write code should not diverge (too much), even if the > operation may be done over Ethernet or MDIO. So we need to find places > which actually make useful sense of bulk reads. I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a buffer for building requests. Each write call appends the write to the buffer and returns 0. A read call gets appended to the buffer and then executes the RMU. We probably also need to wrap the reg mutex, so that when it is released, any buffered writes get executed. If the RMU fails, we have all the information needed to do the same via MDIO. What i was not aware of is that some registers are not supposed to be accessed over RMU. I suppose we can make a list of them, and if there is a read/write to such a register, execute the RMU and then perform an MDIO operation for the restricted register. > But then, Mattias' code structure becomes inadequate. Currently we > serialize mv88e6xxx_master_state_change() with respect to bus accesses > via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with > MDIO, we need a rwlock, such that multiple 'readers' of the conceptual > have_rmu() function can run in parallel with each other, and just > serialize with the RMU state changes (the 'writers'). I don't think we can allow RMU to run in parallel to MDIO. The reg lock will probably prevent that anyway. > > > I am assuming here that RMU is reliable. The QCA8K driver currently > > falls back to MDIO if its inband function is attempted but fails. I > > want to stress this part, lots of data packets and see if the RMU > > frames get dropped, or delayed too much causing failures. > > I don't think you even have to stress it too much. Nothing prevents the > user from putting a policer on the DSA master which will randomly drop > responses. Or a shaper that will delay requests beyond the timeout. That would be a self inflicted problem. But you are correct, we need to fall back to MDIO. This is one area we can experiment with. Maybe we can retry the operation via RMU a few times? Two retries for MIBs is still going to be a lot faster, if successful, compared to all the MDIO transactions for all the statistics. We can also add some fall back tracking logic. If RMU has failed for N times in a row, stop using it for 60 seconds, etc. That might be something we can put into the DSA core, since it seems like a generic problem. There is a lot of experimentation needed here, and it could be we need to throw it all away and try again a few times until we have explored the problem sufficiently to get it right... Andrew
On Thu, Sep 22, 2022 at 02:45:34PM +0200, Andrew Lunn wrote: > > > Doing MIB via RMU is a big gain, but i would also like normal register > > > read and write to go via RMU, probably with some level of > > > combining. Multiple writes can be combined into one RMU operation > > > ending with a read. That should give us an mv88e6xxx_bus_ops which > > > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU > > > bus_ops. > > > > At what level would the combining be done? I think the mv88e6xxx doesn't > > really make use of bulk operations, C45 MDIO reads with post-increment, > > that sort of thing. I could be wrong. And at some higher level, the > > register read/write code should not diverge (too much), even if the > > operation may be done over Ethernet or MDIO. So we need to find places > > which actually make useful sense of bulk reads. > > I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a > buffer for building requests. Each write call appends the write to the > buffer and returns 0. A read call gets appended to the buffer and then > executes the RMU. We probably also need to wrap the reg mutex, so that > when it is released, any buffered writes get executed. If the RMU > fails, we have all the information needed to do the same via MDIO. Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit write barrier. That could work, but the trouble seems to be error propagation. mv88e6xxx_write() will always return 0, the operation will be delayed until the unlock, and mv88e6xxx_reg_unlock() does not return an error code (why would it?). > > But then, Mattias' code structure becomes inadequate. Currently we > > serialize mv88e6xxx_master_state_change() with respect to bus accesses > > via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with > > MDIO, we need a rwlock, such that multiple 'readers' of the conceptual > > have_rmu() function can run in parallel with each other, and just > > serialize with the RMU state changes (the 'writers'). > > I don't think we can allow RMU to run in parallel to MDIO. The reg > lock will probably prevent that anyway. Well, I was thinking the locking could get rearchitected, but it seems you have bigger plans for it, so it becomes even more engrained in the driver :) > > > I am assuming here that RMU is reliable. The QCA8K driver currently > > > falls back to MDIO if its inband function is attempted but fails. I > > > want to stress this part, lots of data packets and see if the RMU > > > frames get dropped, or delayed too much causing failures. > > > > I don't think you even have to stress it too much. Nothing prevents the > > user from putting a policer on the DSA master which will randomly drop > > responses. Or a shaper that will delay requests beyond the timeout. > > That would be a self inflicted problem. But you are correct, we need > to fall back to MDIO. Here's one variation which is really not self inflicted. You have a 10G CPU port, and 1G user ports. You use flow control on the DSA master to avoid packet loss due to the 10G->1G rate adaptation. So the DSA master goes periodically through states of TX congestion and holds back frames until it goes away. This creates latency for packets in the TX queues, including RMU requests, even if the RMU messages don't go to the external ports. And even with a high skb->priority, you'd still need PFC to avoid this problem. This can trip up the timeout timers we have for RMU responses. > This is one area we can experiment with. Maybe we can retry the > operation via RMU a few times? Two retries for MIBs is still going to > be a lot faster, if successful, compared to all the MDIO transactions > for all the statistics. We can also add some fall back tracking > logic. If RMU has failed for N times in a row, stop using it for 60 > seconds, etc. That might be something we can put into the DSA core, > since it seems like a generic problem. Or the driver might have a worker which periodically sends the GetID message and tracks whether the switch responded. Maybe the rescheduling intervals of that are dynamically adjusted based on feedback from timeouts or successes of register reads/writes. In any case, now we're starting to talk about really complex logic. And it's not clear how effective any of these mechanisms would be against random and sporadic timeouts rather than persistent issues.
> > I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a > > buffer for building requests. Each write call appends the write to the > > buffer and returns 0. A read call gets appended to the buffer and then > > executes the RMU. We probably also need to wrap the reg mutex, so that > > when it is released, any buffered writes get executed. If the RMU > > fails, we have all the information needed to do the same via MDIO. > > Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit > write barrier. I'm still thinking this through. It probably needs real code to get all the details sorted out. The locking could be interesting... We need something to flush the queue before we exit from the driver, and that seems like the obvious synchronisation point. If not that, we need to add another function call at all the exit points. > That could work, but the trouble seems to be error propagation. > mv88e6xxx_write() will always return 0, the operation will be delayed > until the unlock, and mv88e6xxx_reg_unlock() does not return an error > code (why would it?). If the RMU fails and the fallback MDIO also fails, we are in big trouble, and the switch is probably dead. At which point, do we really care. netdev_ratelimited_err('Switch has died...') and keep going, everything afterwards is probably going to go wrong as well. I don't think there are any instances in the driver where we try to recover from an MDIO write failure, other than return -ETIMEDOUT or -EIO or whatever. > Or the driver might have a worker which periodically sends the GetID > message and tracks whether the switch responded. Maybe the rescheduling > intervals of that are dynamically adjusted based on feedback from > timeouts or successes of register reads/writes. In any case, now we're > starting to talk about really complex logic. And it's not clear how > effective any of these mechanisms would be against random and sporadic > timeouts rather than persistent issues. I don't think we can assume every switch has an equivalent of GetID. So a solution like this would have to be per driver. Given the potential complexity, it would probably be better to have it in the core, everybody shares it, debugs it, and makes sure it works well. Andrew
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 294bf9bbaf3f..5b22fe4b3284 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -1229,6 +1229,80 @@ static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port, int sset) return count; } +int mv88e6xxx_stats_get_sset_count_rmu(struct dsa_switch *ds, int port, int sset) +{ + if (sset != ETH_SS_STATS) + return 0; + + return ARRAY_SIZE(mv88e6xxx_hw_stats); +} + +void mv88e6xxx_stats_get_strings_rmu(struct dsa_switch *ds, int port, + u32 stringset, uint8_t *data) +{ + struct mv88e6xxx_hw_stat *stat; + int i; + + if (stringset != ETH_SS_STATS) + return; + + for (i = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) { + stat = &mv88e6xxx_hw_stats[i]; + memcpy(data + i * ETH_GSTRING_LEN, stat->string, ETH_GSTRING_LEN); + } +} + +int mv88e6xxx_state_get_stats_rmu(struct mv88e6xxx_chip *chip, int port, + __be32 *raw_cnt, int num_cnt, uint64_t *data) +{ + struct mv88e6xxx_hw_stat *stat; + int offset = 0; + u64 high; + int i, j; + + for (i = 0, j = 0; i < ARRAY_SIZE(mv88e6xxx_hw_stats); i++) { + stat = &mv88e6xxx_hw_stats[i]; + offset = stat->reg; + + if (stat->type & STATS_TYPE_PORT) { + /* The offsets for the port counters below + * are different in the RMU packet. + */ + switch (stat->reg) { + case 0x10: /* sw_in_discards */ + offset = 0x81; + break; + case 0x12: /* sw_in_filtered */ + offset = 0x85; + break; + case 0x13: /* sw_out_filtered */ + offset = 0x89; + break; + default: + dev_err(chip->dev, + "RMU: port %d wrong offset requested: %d\n", + port, stat->reg); + return j; + } + } else if (stat->type & STATS_TYPE_BANK1) { + offset += 32; + } + + if (offset >= num_cnt) + continue; + + data[j] = be32_to_cpu(raw_cnt[offset]); + + if (stat->size == 8) { + high = be32_to_cpu(raw_cnt[offset + 1]); + data[j] += (high << 32); + } + + j++; + } + return j; +} + static int mv88e6xxx_stats_get_stats(struct mv88e6xxx_chip *chip, int port, uint64_t *data, int types, u16 bank1_select, u16 histogram) diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h index 440e9b274df4..aca7cfef196e 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.h +++ b/drivers/net/dsa/mv88e6xxx/chip.h @@ -433,6 +433,7 @@ struct mv88e6xxx_bus_ops { int (*read)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val); int (*write)(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val); int (*init)(struct mv88e6xxx_chip *chip); + void (*get_rmon)(struct mv88e6xxx_chip *chip, int port, uint64_t *data); }; struct mv88e6xxx_mdio_bus { @@ -822,4 +823,9 @@ static inline void mv88e6xxx_reg_unlock(struct mv88e6xxx_chip *chip) int mv88e6xxx_fid_map(struct mv88e6xxx_chip *chip, unsigned long *bitmap); +int mv88e6xxx_state_get_stats_rmu(struct mv88e6xxx_chip *chip, int port, + __be32 *raw_cnt, int num_cnt, uint64_t *data); +int mv88e6xxx_stats_get_sset_count_rmu(struct dsa_switch *ds, int port, int sset); +void mv88e6xxx_stats_get_strings_rmu(struct dsa_switch *ds, int port, + u32 stringset, uint8_t *data); #endif /* _MV88E6XXX_CHIP_H */ diff --git a/drivers/net/dsa/mv88e6xxx/rmu.c b/drivers/net/dsa/mv88e6xxx/rmu.c index c5b3c156de40..129535777c4b 100644 --- a/drivers/net/dsa/mv88e6xxx/rmu.c +++ b/drivers/net/dsa/mv88e6xxx/rmu.c @@ -137,6 +137,50 @@ static int mv88e6xxx_rmu_get_id(struct mv88e6xxx_chip *chip, int port) return ret; } +static void mv88e6xxx_rmu_stats_get(struct mv88e6xxx_chip *chip, int port, uint64_t *data) +{ + u16 req[4] = { MV88E6XXX_RMU_REQ_FORMAT_SOHO, + MV88E6XXX_RMU_REQ_PAD, + MV88E6XXX_RMU_REQ_CODE_DUMP_MIB, + MV88E6XXX_RMU_REQ_DATA}; + struct mv88e6xxx_dump_mib_resp resp; + struct mv88e6xxx_port *p; + u8 resp_port; + int resp_len; + int num_mibs; + int ret; + + /* Populate port number in request */ + req[3] = FIELD_PREP(MV88E6XXX_RMU_REQ_DUMP_MIB_PORT_MASK, port); + + resp_len = sizeof(resp); + ret = mv88e6xxx_rmu_send_wait(chip, req, sizeof(req), + &resp, &resp_len); + if (ret) { + dev_dbg(chip->dev, "RMU: error for command DUMP_MIB %pe port %d\n", + ERR_PTR(ret), port); + return; + } + + /* Got response */ + ret = mv88e6xxx_rmu_validate_response(&resp.rmu_header, MV88E6XXX_RMU_RESP_CODE_DUMP_MIB); + if (ret) + return; + + resp_port = FIELD_GET(MV88E6XXX_SOURCE_PORT, resp.portnum); + p = &chip->ports[resp_port]; + if (!p) { + dev_err_ratelimited(chip->dev, "RMU: illegal port number in response: %d\n", + resp_port); + return; + } + + /* Update MIB for port */ + num_mibs = (resp_len - offsetof(struct mv88e6xxx_dump_mib_resp, mib)) / sizeof(__be32); + + mv88e6xxx_state_get_stats_rmu(chip, port, resp.mib, num_mibs, data); +} + static void mv88e6xxx_disable_rmu(struct mv88e6xxx_chip *chip) { chip->smi_ops = chip->rmu.smi_ops; @@ -255,6 +299,15 @@ int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip) chip->rmu.smi_ops = chip->smi_ops; chip->rmu.ds_ops = chip->ds->ops; + /* Change rmu ops with our own pointer */ + chip->rmu.smi_rmu_ops = (struct mv88e6xxx_bus_ops *)chip->rmu.smi_ops; + chip->rmu.smi_rmu_ops->get_rmon = mv88e6xxx_rmu_stats_get; + + /* Also change get stats strings for our own */ + chip->rmu.ds_rmu_ops = (struct dsa_switch_ops *)chip->ds->ops; + chip->rmu.ds_rmu_ops->get_sset_count = mv88e6xxx_stats_get_sset_count_rmu; + chip->rmu.ds_rmu_ops->get_strings = mv88e6xxx_stats_get_strings_rmu; + /* Start disabled, we'll enable RMU in master_state_change */ if (chip->info->ops->rmu_disable) return chip->info->ops->rmu_disable(chip);
It's possible to access the RMON counters via RMU. Add functionality to send DUMP_MIB frames. Signed-off-by: Mattias Forsblad <mattias.forsblad@gmail.com> --- drivers/net/dsa/mv88e6xxx/chip.c | 74 ++++++++++++++++++++++++++++++++ drivers/net/dsa/mv88e6xxx/chip.h | 6 +++ drivers/net/dsa/mv88e6xxx/rmu.c | 53 +++++++++++++++++++++++ 3 files changed, 133 insertions(+)