Message ID | 20200519211531.3702593-1-kuba@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/2] devlink: add simple fw crash helpers | expand |
On Tue, May 19, 2020 at 02:15:30PM -0700, Jakub Kicinski wrote: > Add infra for creating devlink instances for a device to report Thanks for doing this series as a PoC, counter to the module_firmware_crash() which I proposed to taint the kernel with a firmware crash flag to the kernel and module. For those not famliar about devlink: https://lwn.net/Articles/677967/ https://www.kernel.org/doc/html/latest/networking/devlink/index.html The github page also is now 404 as Jiri merged that stuff into iproute2: git://git.kernel.org/pub/scm/network/iproute2/iproute2.git > fw crashes. This patch expects the devlink instance to be registered > at probe time. I belive to be the cleanest. We can also add a devm > version of the helpers, so that we don't have to do the clean up. > Or we can go even further and register the devlink instance only > once error has happened (for the first time, then we can just > find out if already registered by traversing the list like we > do here). > > With the patch applied and a sample driver converted we get: > > $ devlink dev > pci/0000:07:00.0 > > Then monitor for errors: > > $ devlink mon health > [health,status] pci/0000:07:00.0: > reporter fw > state error error 1 recover 0 > [health,status] pci/0000:07:00.0: > reporter fw > state error error 2 recover 0 > > These are the events I triggered on purpose. One can also inspect > the health of all devices capable of reporting fw errors: > > $ devlink health > pci/0000:07:00.0: > reporter fw > state error error 7 recover 0 > > Obviously drivers may upgrade to the full devlink health API > which includes state dump, state dump auto-collect and automatic > error recovery control. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > include/linux/devlink.h | 11 +++ > net/core/Makefile | 2 +- > net/core/devlink_simple_fw_reporter.c | 101 ++++++++++++++++++++++++++ > 3 files changed, 113 insertions(+), 1 deletion(-) > create mode 100644 include/linux/devlink.h > create mode 100644 net/core/devlink_simple_fw_reporter.c > > diff --git a/include/linux/devlink.h b/include/linux/devlink.h > new file mode 100644 > index 000000000000..2b73987eefca > --- /dev/null > +++ b/include/linux/devlink.h > @@ -0,0 +1,11 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +#ifndef _LINUX_DEVLINK_H_ > +#define _LINUX_DEVLINK_H_ > + > +struct device; > + > +void devlink_simple_fw_reporter_prepare(struct device *dev); > +void devlink_simple_fw_reporter_cleanup(struct device *dev); > +void devlink_simple_fw_reporter_report_crash(struct device *dev); > + > +#endif > diff --git a/net/core/Makefile b/net/core/Makefile > index 3e2c378e5f31..6f1513781c17 100644 > --- a/net/core/Makefile > +++ b/net/core/Makefile > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o > obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o > obj-$(CONFIG_DST_CACHE) += dst_cache.o > obj-$(CONFIG_HWBM) += hwbm.o > -obj-$(CONFIG_NET_DEVLINK) += devlink.o > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o This was looking super sexy up to here. This is networking specific. We want something generic for *anything* that requests firmware. I'm afraid this won't work for something generic. I don't think its throw-away work though, the idea to provide a generic interface to dump firmware through netlink might be nice for networking, or other things. But I have a feeling we'll want something still more generic than this. So networking may want to be aware that a firmware crash happened as part of this network device health thing, but firmware crashing is a generic thing. I have now extended my patch set to include uvents and I am more set on that we need the taint now more than ever. Luis > obj-$(CONFIG_GRO_CELLS) += gro_cells.o > obj-$(CONFIG_FAILOVER) += failover.o > obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o > diff --git a/net/core/devlink_simple_fw_reporter.c b/net/core/devlink_simple_fw_reporter.c > new file mode 100644 > index 000000000000..48dde9123c3c > --- /dev/null > +++ b/net/core/devlink_simple_fw_reporter.c > @@ -0,0 +1,101 @@ > +#include <linux/devlink.h> > +#include <linux/list.h> > +#include <linux/mutex.h> > +#include <net/devlink.h> > + > +struct devlink_simple_fw_reporter { > + struct list_head list; > + struct devlink_health_reporter *reporter; > +}; > + > + > +static LIST_HEAD(devlink_simple_fw_reporters); > +static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex); > + > +static const struct devlink_health_reporter_ops simple_devlink_health = { > + .name = "fw", > +}; > + > +static const struct devlink_ops simple_devlink_ops = { > +}; > + > +static struct devlink_simple_fw_reporter * > +devlink_simple_fw_reporter_find_for_dev(struct device *dev) > +{ > + struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL; > + struct devlink *devlink; > + > + mutex_lock(&devlink_simple_fw_reporters_mutex); > + list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters, > + list) { > + devlink = priv_to_devlink(simple_devlink); > + if (devlink->dev == dev) { > + ret = simple_devlink; > + break; > + } > + } > + mutex_unlock(&devlink_simple_fw_reporters_mutex); > + > + return ret; > +} > + > +void devlink_simple_fw_reporter_report_crash(struct device *dev) > +{ > + struct devlink_simple_fw_reporter *simple_devlink; > + > + simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev); > + if (!simple_devlink) > + return; > + > + devlink_health_report(simple_devlink->reporter, "firmware crash", NULL); > +} > +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_report_crash); > + > +void devlink_simple_fw_reporter_prepare(struct device *dev) > +{ > + struct devlink_simple_fw_reporter *simple_devlink; > + struct devlink *devlink; > + > + devlink = devlink_alloc(&simple_devlink_ops, > + sizeof(struct devlink_simple_fw_reporter)); > + if (!devlink) > + return; > + > + if (devlink_register(devlink, dev)) > + goto err_free; > + > + simple_devlink = devlink_priv(devlink); > + simple_devlink->reporter = > + devlink_health_reporter_create(devlink, &simple_devlink_health, > + 0, NULL); > + if (IS_ERR(simple_devlink->reporter)) > + goto err_unregister; > + > + mutex_lock(&devlink_simple_fw_reporters_mutex); > + list_add_tail(&simple_devlink->list, &devlink_simple_fw_reporters); > + mutex_unlock(&devlink_simple_fw_reporters_mutex); > + > + return; > + > +err_unregister: > + devlink_unregister(devlink); > +err_free: > + devlink_free(devlink); > +} > +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_prepare); > + > +void devlink_simple_fw_reporter_cleanup(struct device *dev) > +{ > + struct devlink_simple_fw_reporter *simple_devlink; > + struct devlink *devlink; > + > + simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev); > + if (!simple_devlink) > + return; > + > + devlink = priv_to_devlink(simple_devlink); > + devlink_health_reporter_destroy(simple_devlink->reporter); > + devlink_unregister(devlink); > + devlink_free(devlink); > +} > +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_cleanup); > -- > 2.25.4 >
On Fri, 22 May 2020 05:20:46 +0000 Luis Chamberlain wrote: > > diff --git a/net/core/Makefile b/net/core/Makefile > > index 3e2c378e5f31..6f1513781c17 100644 > > --- a/net/core/Makefile > > +++ b/net/core/Makefile > > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o > > obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o > > obj-$(CONFIG_DST_CACHE) += dst_cache.o > > obj-$(CONFIG_HWBM) += hwbm.o > > -obj-$(CONFIG_NET_DEVLINK) += devlink.o > > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o > > This was looking super sexy up to here. This is networking specific. > We want something generic for *anything* that requests firmware. You can't be serious. It's network specific because of how the Kconfig is named? Working for a company operating large data centers I would strongly prefer if we didn't have ten different ways of reporting firmware problems in the fleet. > I'm afraid this won't work for something generic. I don't think its > throw-away work though, the idea to provide a generic interface to > dump firmware through netlink might be nice for networking, or other > things. > > But I have a feeling we'll want something still more generic than this. Please be specific. Saying generic a lot is not helpful. The code (as you can see in this patch) is in no way network specific. Or are you saying there are machines out there running without netlink sockets? > So networking may want to be aware that a firmware crash happened as > part of this network device health thing, but firmware crashing is a > generic thing. > > I have now extended my patch set to include uvents and I am more set on > that we need the taint now more than ever. Please expect my nack if you're trying to add this to networking drivers. The irony is you have a problem with a networking device and all the devices your initial set touched are networking. Two of the drivers you touched either have or will soon have devlink health reporters implemented.
On Fri, 2020-05-22 at 10:17 -0700, Jakub Kicinski wrote: > > > --- a/net/core/Makefile > > > +++ b/net/core/Makefile > > > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o > > > obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o > > > obj-$(CONFIG_DST_CACHE) += dst_cache.o > > > obj-$(CONFIG_HWBM) += hwbm.o > > > -obj-$(CONFIG_NET_DEVLINK) += devlink.o > > > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o > > > > This was looking super sexy up to here. This is networking specific. > > We want something generic for *anything* that requests firmware. > > You can't be serious. It's network specific because of how the Kconfig > is named? Wait, yeah, what? > Working for a company operating large data centers I would strongly > prefer if we didn't have ten different ways of reporting firmware > problems in the fleet. Agree. I don't actually operate anything, but still ... Thinking about this - maybe there's a way to still combine devcoredump and devlink somehow? Or (optionally) make devlink trigger devcoredump while userspace migrates? > > So networking may want to be aware that a firmware crash happened as > > part of this network device health thing, but firmware crashing is a > > generic thing. > > > > I have now extended my patch set to include uvents and I am more set on > > that we need the taint now more than ever. FWIW, I still completely disagree on that taint. You (Luis) obviously have been running into a bug in that driver, I doubt the firmware actually managed to wedge the hardware. But even if it did, that's still not really a kernel taint. The kernel itself isn't in any way affected by this. Yes, the system is in a weird state now. But that's *not* equivalent to "kernel tainted". > The irony is you have a problem with a networking device and all the > devices your initial set touched are networking. Two of the drivers > you touched either have or will soon have devlink health reporters > implemented. Like I said above, do you think it'd be feasible to make a devcoredump out of devlink health reports? And can the report be in a way that we control the file format, or are there limits? I guess I should read the code to find out, but I figure you probably just know. But feel free to tell me to read it :) The reason I'm asking is that it's starting to sound like we really ought to be implementing devlink, but we've got a bunch of infrastructure that uses the devcoredump, and it'll take time (significantly so) to change all that... johannes
On Fri, May 22, 2020 at 10:17:38AM -0700, Jakub Kicinski wrote: > On Fri, 22 May 2020 05:20:46 +0000 Luis Chamberlain wrote: > > > diff --git a/net/core/Makefile b/net/core/Makefile > > > index 3e2c378e5f31..6f1513781c17 100644 > > > --- a/net/core/Makefile > > > +++ b/net/core/Makefile > > > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o > > > obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o > > > obj-$(CONFIG_DST_CACHE) += dst_cache.o > > > obj-$(CONFIG_HWBM) += hwbm.o > > > -obj-$(CONFIG_NET_DEVLINK) += devlink.o > > > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o > > > > This was looking super sexy up to here. This is networking specific. > > We want something generic for *anything* that requests firmware. > > You can't be serious. It's network specific because of how the Kconfig > is named? Kconfig? What has that to do with anything? The issue I have is that the solution I am looking for is for it to be agnostic to the subsystem. I have found similar firmware crashes on gpu, media, scsci. > Working for a company operating large data centers I would strongly > prefer if we didn't have ten different ways of reporting firmware > problems in the fleet. Indeed. > > I'm afraid this won't work for something generic. I don't think its > > throw-away work though, the idea to provide a generic interface to > > dump firmware through netlink might be nice for networking, or other > > things. > > > > But I have a feeling we'll want something still more generic than this. > > Please be specific. Saying generic a lot is not helpful. The code (as > you can see in this patch) is in no way network specific. Or are you > saying there are machines out there running without netlink sockets? No, I am saying I want something to work with any struct device. > > So networking may want to be aware that a firmware crash happened as > > part of this network device health thing, but firmware crashing is a > > generic thing. > > > > I have now extended my patch set to include uvents and I am more set on > > that we need the taint now more than ever. > > Please expect my nack if you're trying to add this to networking > drivers. The uevent mechanism is not for networking. The taint however is, and I'd like to undertand how it is you do not see that an undesirable requirement for a reboot is a clear case for a taint. > The irony is you have a problem with a networking device and all the > devices your initial set touched are networking. Two of the drivers > you touched either have or will soon have devlink health reporters > implemented. That is all great, and I don't think its a bad idea to add infrastructure / extend it to get more information about a firmware crash dump. However, suggesting that devlink is the only solution we need in the kernel without considering other subsystems is what I am suggesting doesn't suit my needs. Networking was just the first subsystem I am taclking now but I have patches where similar situations happen across the kernel. Luis
On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote: > FWIW, I still completely disagree on that taint. You (Luis) obviously > have been running into a bug in that driver, I doubt the firmware > actually managed to wedge the hardware. This hasn't happened just once, its happed many times sporadically now, once a week or two weeks I'd say. And the system isn't being moved around. > But even if it did, that's still not really a kernel taint. The kernel > itself isn't in any way affected by this. Of course it is, a full reboot is required. > Yes, the system is in a weird state now. But that's *not* equivalent to > "kernel tainted". Requiring a full reboot is a dire situation to be in, and loosing connectivity to the point this is not recoverable likewise. You guys are making out a taint to be the end of the world. We have a taint even for a kernel warning, and as others have mentioned mac80211 already produces these. What exactly is the opposition to a taint to clarify that a device firmware has crashed and your system requires a full reboot? Luis
On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote: > > FWIW, I still completely disagree on that taint. You (Luis) obviously > > have been running into a bug in that driver, I doubt the firmware > > actually managed to wedge the hardware. > > This hasn't happened just once, its happed many times sporadically now, > once a week or two weeks I'd say. And the system isn't being moved > around. > > > But even if it did, that's still not really a kernel taint. The kernel > > itself isn't in any way affected by this. > > Of course it is, a full reboot is required. > > > Yes, the system is in a weird state now. But that's *not* equivalent to > > "kernel tainted". > > Requiring a full reboot is a dire situation to be in, and loosing > connectivity to the point this is not recoverable likewise. > > You guys are making out a taint to be the end of the world. We have a > taint even for a kernel warning, and as others have mentioned mac80211 > already produces these. > I had to go RTFM re: kernel taints because it has been a very long time since I looked at them. It had always seemed to me that most were caused by "kernel-unfriendly" user actions. The most famous of course is loading proprietary modules, out-of-tree modules, forced module loads, etc... Honestly, I had forgotten the large variety of uses of the taint flags. For anyone who hasn't looked at taints recently, I recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html In light of this I don't object to setting a taint on this anymore. I'm a little uneasy, but I've softened on it now, and now I feel it depends on implementation. Specifically, I don't think we should set a taint flag when a driver easily handles a routine firmware crash and is confident that things have come up just fine again. In other words, triggering the taint in every driver module where it spits out a log comment that it had a firmware crash and had to recover seems too much. Sure, firmware shouldn't crash, sure it should be open source so we can fix it, whatever... those sort of wishful comments simply ignore reality and our ability to affect effective change. A lot of WiFi firmware crashes and for well-known cases the drivers handle them well. And in some cases, not so well and that should be a place the driver should detect and thus raise a red flag. If a WiFi firmware crash can bring down the kernel, there's either a major driver bug or some very funky hardware crap going on. That sort of thing we should be able to detect, mark with a taint (or something), and fix if within our sphere of influence. I guess what it comes down to me is how aggressive we are about setting the flag. I would like there to be a single solution, or a minimized set depending on what makes sense for the requirements. I haven't had time to look into the alternatives mentioned here so I don't have an informed opinion about the solution. I do think Luis is trying to solve a real problem though. Can we look at this from the point of view of what are the requirements? What is it we're trying to solve? I _think_ that the goal of Luis's original proposal is to report up to the user, at some future point when the user is interested (because something super drastic just occured, but long after the fw crash), that there was a firmware crash without the user having to grep through all logs on the machine. And then if the user sees that flag and suspects it, then they can bother to find it in the logs or do more drastic debugging steps like finding the fw crash in the log and pulling firmware crash dumps, etc. I think the various alternate solutions are great but perhaps solving a superset of features (like adding in user-space notifications etc)? Perhaps different people on these related threads are trying to solve different problems? - Steve
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote: > Specifically, I don't think we should set a taint flag when a driver > easily handles a routine firmware crash and is confident that things > have come up just fine again. In other words, triggering the taint in > every driver module where it spits out a log comment that it had a > firmware crash and had to recover seems too much. Sure, firmware > shouldn't crash, sure it should be open source so we can fix it, > whatever... those sort of wishful comments simply ignore reality and > our ability to affect effective change. A lot of WiFi firmware crashes > and for well-known cases the drivers handle them well. And in some > cases, not so well and that should be a place the driver should detect > and thus raise a red flag. If a WiFi firmware crash can bring down > the kernel, there's either a major driver bug or some very funky > hardware crap going on. That sort of thing we should be able to > detect, mark with a taint (or something), and fix if within our sphere > of influence. I guess what it comes down to me is how aggressive we > are about setting the flag. Exactly the crux of the issue. I hope that by now we should all be in agreement that at least a firmware crash requiring a reboot is something we should record and inform the user of. A taint seems like a reasonable standard practice for these sorts of things. > I would like there to be a single solution, or a minimized set > depending on what makes sense for the requirements. I haven't had time > to look into the alternatives mentioned here so I don't have an > informed opinion about the solution. I do think Luis is trying to > solve a real problem though. Can we look at this from the point of > view of what are the requirements? What is it we're trying to solve? > > I _think_ that the goal of Luis's original proposal is to report up to > the user, at some future point when the user is interested (because > something super drastic just occured, but long after the fw crash), > that there was a firmware crash without the user having to grep > through all logs on the machine. And then if the user sees that flag > and suspects it, then they can bother to find it in the logs or do > more drastic debugging steps like finding the fw crash in the log and > pulling firmware crash dumps, etc. Yes, that's exactly it. Not all users are clueful to inspect logs. I now have a generic uevent mechanism drafted which sends a uevent for *any* taint. So that is, it does not even depend on this series. But it accomplishes the goal of informing the user of taints. > I think the various alternate solutions are great but perhaps solving > a superset of features (like adding in user-space notifications etc)? > Perhaps different people on these related threads are trying to solve > different problems? The uevent mechanism I implemented (but not yet posted for review) at least sends out a smoke signal. I think that if each subsystem wants to expand on this with dumping facilities that is great too! Luis
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote: > On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > I had to go RTFM re: kernel taints because it has been a very long > time since I looked at them. It had always seemed to me that most were > caused by "kernel-unfriendly" user actions. The most famous of course > is loading proprietary modules, out-of-tree modules, forced module > loads, etc... Honestly, I had forgotten the large variety of uses of > the taint flags. For anyone who hasn't looked at taints recently, I > recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html > > In light of this I don't object to setting a taint on this anymore. > I'm a little uneasy, but I've softened on it now, and now I feel it > depends on implementation. > > Specifically, I don't think we should set a taint flag when a driver > easily handles a routine firmware crash and is confident that things > have come up just fine again. In other words, triggering the taint in > every driver module where it spits out a log comment that it had a > firmware crash and had to recover seems too much. Sure, firmware > shouldn't crash, sure it should be open source so we can fix it, > whatever... While it may sound idealistic the firmware for the end-user, and even for mere kernel developer like me, is a complete blackbox which has more access than root user in the kernel. We have tons of firmwares and each of them potentially dangerous beast. As a user I really care about my data and privacy (hacker can oops a firmware in order to set a specific vector attack). So, tainting kernel is _a least_ we can do there, the strict rules would be to reboot immediately. > those sort of wishful comments simply ignore reality and > our ability to affect effective change. We can encourage users not to buy cheap crap for the starter. > A lot of WiFi firmware crashes > and for well-known cases the drivers handle them well. And in some > cases, not so well and that should be a place the driver should detect > and thus raise a red flag. If a WiFi firmware crash can bring down > the kernel, there's either a major driver bug or some very funky > hardware crap going on. That sort of thing we should be able to > detect, mark with a taint (or something), and fix if within our sphere > of influence. I guess what it comes down to me is how aggressive we > are about setting the flag.
On 05/25/2020 02:07 AM, Andy Shevchenko wrote: > On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote: >> On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > >> I had to go RTFM re: kernel taints because it has been a very long >> time since I looked at them. It had always seemed to me that most were >> caused by "kernel-unfriendly" user actions. The most famous of course >> is loading proprietary modules, out-of-tree modules, forced module >> loads, etc... Honestly, I had forgotten the large variety of uses of >> the taint flags. For anyone who hasn't looked at taints recently, I >> recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html >> >> In light of this I don't object to setting a taint on this anymore. >> I'm a little uneasy, but I've softened on it now, and now I feel it >> depends on implementation. >> >> Specifically, I don't think we should set a taint flag when a driver >> easily handles a routine firmware crash and is confident that things >> have come up just fine again. In other words, triggering the taint in >> every driver module where it spits out a log comment that it had a >> firmware crash and had to recover seems too much. Sure, firmware >> shouldn't crash, sure it should be open source so we can fix it, >> whatever... > > While it may sound idealistic the firmware for the end-user, and even for mere > kernel developer like me, is a complete blackbox which has more access than > root user in the kernel. We have tons of firmwares and each of them potentially > dangerous beast. As a user I really care about my data and privacy (hacker can > oops a firmware in order to set a specific vector attack). So, tainting kernel > is _a least_ we can do there, the strict rules would be to reboot immediately. > >> those sort of wishful comments simply ignore reality and >> our ability to affect effective change. > > We can encourage users not to buy cheap crap for the starter. There is no stable wifi firmware for any price. There is also no obvious feedback from even name-brand NICs like ath10k or AX200 when you report a crash. That said, at least in my experience with ath10k-ct, the OS normally recovers fine from firmware crashes. ath10k already reports full crash reports on udev, so easy for user-space to notice and report bug reports upstream if it cares to. Probably other NICs do the same, and if not, they certainly could. Thanks, Ben
On Fri, 22 May 2020 22:46:07 +0200 Johannes Berg wrote: > > The irony is you have a problem with a networking device and all the > > devices your initial set touched are networking. Two of the drivers > > you touched either have or will soon have devlink health reporters > > implemented. > > Like I said above, do you think it'd be feasible to make a devcoredump > out of devlink health reports? And can the report be in a way that we > control the file format, or are there limits? I guess I should read the > code to find out, but I figure you probably just know. But feel free to > tell me to read it :) > > The reason I'm asking is that it's starting to sound like we really > ought to be implementing devlink, but we've got a bunch of > infrastructure that uses the devcoredump, and it'll take time > (significantly so) to change all that... In devlink world pure FW core dumps are exposed by devlink regions. An API allowing reading device memory, registers, etc., but also creating dumps of memory regions when things go wrong. It should be a fairly straightforward migration. Devlink health is more targeted, the dump is supposed to contain only relevant information, selected and formatted by the driver. When device misbehaves driver reads the relevant registers and FW state and creates a formatted state dump. I believe each element of the dump must fit into a netlink message (but there may be multiple elements, see devlink_fmsg_prepare_skb()). We should be able to convert dl-regions dumps and dl-health dumps into devcoredumps, but since health reporters are supposed to be more targeted there's usually multiple of them per device. Conversely devcoredumps can be trivially exposed as dl-region dumps, but I believe dl-health would require driver changes to format the information appropriately.
Hi, Sorry ... long delay. > > The reason I'm asking is that it's starting to sound like we really > > ought to be implementing devlink, but we've got a bunch of > > infrastructure that uses the devcoredump, and it'll take time > > (significantly so) to change all that... > > In devlink world pure FW core dumps are exposed by devlink regions. > An API allowing reading device memory, registers, etc., but also > creating dumps of memory regions when things go wrong. It should be > a fairly straightforward migration. Right. We also have regions (various memory pieces, registers, ...). One issue might be that for devlink we wouldn't want to expose these as a single blob, I guess, but for devcoredump we already have a custom format to glue all the things together. Since it seems unlikely that anyone else would want to use the *iwlwifi* format to glue things together, we'd have to do something there. But perhaps that could be a matter of providing a "glue things into a devcoredump" function that would have a reasonable default but could be overridden by the driver for those migration cases. > Devlink health is more targeted, the dump is supposed to contain only > relevant information, selected and formatted by the driver. When device > misbehaves driver reads the relevant registers and FW state and creates > a formatted state dump. I believe each element of the dump must fit > into a netlink message (but there may be multiple elements, see > devlink_fmsg_prepare_skb()). That wouldn't help for our big memory dumps, but OK. > We should be able to convert dl-regions dumps and dl-health dumps into > devcoredumps, but since health reporters are supposed to be more > targeted there's usually multiple of them per device. Right. > Conversely devcoredumps can be trivially exposed as dl-region dumps, > but I believe dl-health would require driver changes to format the > information appropriately. Agree. Anyway, thanks. I'll put it on my list of things to look at ... not too hopeful that will be soon, given how long it even took me to get back to this email :) johannes
diff --git a/include/linux/devlink.h b/include/linux/devlink.h new file mode 100644 index 000000000000..2b73987eefca --- /dev/null +++ b/include/linux/devlink.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_DEVLINK_H_ +#define _LINUX_DEVLINK_H_ + +struct device; + +void devlink_simple_fw_reporter_prepare(struct device *dev); +void devlink_simple_fw_reporter_cleanup(struct device *dev); +void devlink_simple_fw_reporter_report_crash(struct device *dev); + +#endif diff --git a/net/core/Makefile b/net/core/Makefile index 3e2c378e5f31..6f1513781c17 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o obj-$(CONFIG_DST_CACHE) += dst_cache.o obj-$(CONFIG_HWBM) += hwbm.o -obj-$(CONFIG_NET_DEVLINK) += devlink.o +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o obj-$(CONFIG_GRO_CELLS) += gro_cells.o obj-$(CONFIG_FAILOVER) += failover.o obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o diff --git a/net/core/devlink_simple_fw_reporter.c b/net/core/devlink_simple_fw_reporter.c new file mode 100644 index 000000000000..48dde9123c3c --- /dev/null +++ b/net/core/devlink_simple_fw_reporter.c @@ -0,0 +1,101 @@ +#include <linux/devlink.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <net/devlink.h> + +struct devlink_simple_fw_reporter { + struct list_head list; + struct devlink_health_reporter *reporter; +}; + + +static LIST_HEAD(devlink_simple_fw_reporters); +static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex); + +static const struct devlink_health_reporter_ops simple_devlink_health = { + .name = "fw", +}; + +static const struct devlink_ops simple_devlink_ops = { +}; + +static struct devlink_simple_fw_reporter * +devlink_simple_fw_reporter_find_for_dev(struct device *dev) +{ + struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL; + struct devlink *devlink; + + mutex_lock(&devlink_simple_fw_reporters_mutex); + list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters, + list) { + devlink = priv_to_devlink(simple_devlink); + if (devlink->dev == dev) { + ret = simple_devlink; + break; + } + } + mutex_unlock(&devlink_simple_fw_reporters_mutex); + + return ret; +} + +void devlink_simple_fw_reporter_report_crash(struct device *dev) +{ + struct devlink_simple_fw_reporter *simple_devlink; + + simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev); + if (!simple_devlink) + return; + + devlink_health_report(simple_devlink->reporter, "firmware crash", NULL); +} +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_report_crash); + +void devlink_simple_fw_reporter_prepare(struct device *dev) +{ + struct devlink_simple_fw_reporter *simple_devlink; + struct devlink *devlink; + + devlink = devlink_alloc(&simple_devlink_ops, + sizeof(struct devlink_simple_fw_reporter)); + if (!devlink) + return; + + if (devlink_register(devlink, dev)) + goto err_free; + + simple_devlink = devlink_priv(devlink); + simple_devlink->reporter = + devlink_health_reporter_create(devlink, &simple_devlink_health, + 0, NULL); + if (IS_ERR(simple_devlink->reporter)) + goto err_unregister; + + mutex_lock(&devlink_simple_fw_reporters_mutex); + list_add_tail(&simple_devlink->list, &devlink_simple_fw_reporters); + mutex_unlock(&devlink_simple_fw_reporters_mutex); + + return; + +err_unregister: + devlink_unregister(devlink); +err_free: + devlink_free(devlink); +} +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_prepare); + +void devlink_simple_fw_reporter_cleanup(struct device *dev) +{ + struct devlink_simple_fw_reporter *simple_devlink; + struct devlink *devlink; + + simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev); + if (!simple_devlink) + return; + + devlink = priv_to_devlink(simple_devlink); + devlink_health_reporter_destroy(simple_devlink->reporter); + devlink_unregister(devlink); + devlink_free(devlink); +} +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_cleanup);
Add infra for creating devlink instances for a device to report fw crashes. This patch expects the devlink instance to be registered at probe time. I belive to be the cleanest. We can also add a devm version of the helpers, so that we don't have to do the clean up. Or we can go even further and register the devlink instance only once error has happened (for the first time, then we can just find out if already registered by traversing the list like we do here). With the patch applied and a sample driver converted we get: $ devlink dev pci/0000:07:00.0 Then monitor for errors: $ devlink mon health [health,status] pci/0000:07:00.0: reporter fw state error error 1 recover 0 [health,status] pci/0000:07:00.0: reporter fw state error error 2 recover 0 These are the events I triggered on purpose. One can also inspect the health of all devices capable of reporting fw errors: $ devlink health pci/0000:07:00.0: reporter fw state error error 7 recover 0 Obviously drivers may upgrade to the full devlink health API which includes state dump, state dump auto-collect and automatic error recovery control. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- include/linux/devlink.h | 11 +++ net/core/Makefile | 2 +- net/core/devlink_simple_fw_reporter.c | 101 ++++++++++++++++++++++++++ 3 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 include/linux/devlink.h create mode 100644 net/core/devlink_simple_fw_reporter.c