Message ID | 20170821.112747.1532639515902173100.davem@davemloft.net (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
I think with this patch from -rc6 the symptoms should be cured: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7 if that theory is right.
> I think with this patch from -rc6 the symptoms should be cured: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7 > > if that theory is right. The result with 4.13-rc6 is positive but mixed: the message about MSI-X affinty maks are still there but the rest of the detection works and the driver is loaded successfully: [ 29.924282] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.00-k. [ 29.924710] qla2xxx [0000:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 0x000000c100d00000. [ 29.925581] qla2xxx 0000:10:00.0: can't allocate MSI-X affinity masks for 2 vectors [ 30.483422] scsi host1: qla2xxx [ 35.495031] qla2xxx [0000:10:00.0]-00fb:1: QLogic QLE2462 - SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H. [ 35.495274] qla2xxx [0000:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ 0000:10:00.0 hdma- host#=1 fw=7.03.00 (9496). [ 35.495615] qla2xxx [0000:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 0x000000c100d04000. [ 35.496409] qla2xxx 0000:10:00.1: can't allocate MSI-X affinity masks for 2 vectors [ 35.985355] scsi host2: qla2xxx [ 40.996991] qla2xxx [0000:10:00.1]-00fb:2: QLogic QLE2462 - SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H. [ 40.997251] qla2xxx [0000:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ 0000:10:00.1 hdma- host#=2 fw=7.03.00 (9496). [ 51.880945] qla2xxx [0000:10:00.0]-8038:1: Cable is unplugged... [ 57.402900] qla2xxx [0000:10:00.1]-8038:2: Cable is unplugged... With Dave Millers patch on top of 4.13-rc6, I see the following before both MSI-X messages: irq_create_affinity_masks: nvecs[2] affd->pre_vectors[2] affd->post_vectors[0]
From: mroos@linux.ee Date: Mon, 21 Aug 2017 22:20:22 +0300 (EEST) >> I think with this patch from -rc6 the symptoms should be cured: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7 >> >> if that theory is right. > > The result with 4.13-rc6 is positive but mixed: the message about MSI-X > affinty maks are still there but the rest of the detection works and the > driver is loaded successfully: Is this an SMP system? I ask because the commit log message indicates that this failure is not expected to ever happen on SMP. We really need to root cause this.
> > >> I think with this patch from -rc6 the symptoms should be cured: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7 > >> > >> if that theory is right. > > > > The result with 4.13-rc6 is positive but mixed: the message about MSI-X > > affinty maks are still there but the rest of the detection works and the > > driver is loaded successfully: > > Is this an SMP system? Yes, T5120.
On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote: > I ask because the commit log message indicates that this failure is > not expected to ever happen on SMP. I fear my commit message (but not the code) might be wrong. irq_create_affinity_masks can return NULL any time we don't have any affinity masks. I've already had a discussion about this elsewhere with Bjorn, and I suspect we need to kill the warning or move it to irq_create_affinity_masks only for genuine failure cases. > > We really need to root cause this. ---end quoted text---
From: Christoph Hellwig <hch@lst.de> Date: Tue, 22 Aug 2017 08:35:05 +0200 > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote: >> I ask because the commit log message indicates that this failure is >> not expected to ever happen on SMP. > > I fear my commit message (but not the code) might be wrong. > irq_create_affinity_masks can return NULL any time we don't have any > affinity masks. I've already had a discussion about this elsewhere > with Bjorn, and I suspect we need to kill the warning or move it > to irq_create_affinity_masks only for genuine failure cases. This is a rather large machine with 64 or more cpus and several NUMA nodes. Why wouldn't there be any affinity masks available? That's why I want to root cause this.
> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote: > >> I ask because the commit log message indicates that this failure is > >> not expected to ever happen on SMP. > > > > I fear my commit message (but not the code) might be wrong. > > irq_create_affinity_masks can return NULL any time we don't have any > > affinity masks. I've already had a discussion about this elsewhere > > with Bjorn, and I suspect we need to kill the warning or move it > > to irq_create_affinity_masks only for genuine failure cases. > > This is a rather large machine with 64 or more cpus and several NUMA > nodes. Why wouldn't there be any affinity masks available? T5120 with 1 slot and 32 threads total. I have not configured any NUM on it is there any reason for that?
On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote: > > I fear my commit message (but not the code) might be wrong. > > irq_create_affinity_masks can return NULL any time we don't have any > > affinity masks. I've already had a discussion about this elsewhere > > with Bjorn, and I suspect we need to kill the warning or move it > > to irq_create_affinity_masks only for genuine failure cases. > > This is a rather large machine with 64 or more cpus and several NUMA > nodes. Why wouldn't there be any affinity masks available? The drivers only asked for two MSI-X vectors, and marked bost of them as pre-vectors that should not be spread. So there is no actual vector left that we want to actually spread.
From: Meelis Roos <mroos@linux.ee> Date: Tue, 22 Aug 2017 19:33:55 +0300 (EEST) >> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote: >> >> I ask because the commit log message indicates that this failure is >> >> not expected to ever happen on SMP. >> > >> > I fear my commit message (but not the code) might be wrong. >> > irq_create_affinity_masks can return NULL any time we don't have any >> > affinity masks. I've already had a discussion about this elsewhere >> > with Bjorn, and I suspect we need to kill the warning or move it >> > to irq_create_affinity_masks only for genuine failure cases. >> >> This is a rather large machine with 64 or more cpus and several NUMA >> nodes. Why wouldn't there be any affinity masks available? > > T5120 with 1 slot and 32 threads total. I have not configured any NUM on > it is there any reason for that? Ok 32 cpus and 1 NUMA node, my bad :-)
From: Christoph Hellwig <hch@lst.de> Date: Tue, 22 Aug 2017 18:39:16 +0200 > On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote: >> > I fear my commit message (but not the code) might be wrong. >> > irq_create_affinity_masks can return NULL any time we don't have any >> > affinity masks. I've already had a discussion about this elsewhere >> > with Bjorn, and I suspect we need to kill the warning or move it >> > to irq_create_affinity_masks only for genuine failure cases. >> >> This is a rather large machine with 64 or more cpus and several NUMA >> nodes. Why wouldn't there be any affinity masks available? > > The drivers only asked for two MSI-X vectors, and marked bost of them > as pre-vectors that should not be spread. So there is no actual > vector left that we want to actually spread. Ok, now it makes more sense, and yes the warning should be removed.
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index d69bd77252a7..d16c6326000a 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -110,6 +110,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) struct cpumask *masks; cpumask_var_t nmsk, *node_to_present_cpumask; + pr_info("irq_create_affinity_masks: nvecs[%d] affd->pre_vectors[%d] " + "affd->post_vectors[%d]\n", + nvecs, affd->pre_vectors, affd->post_vectors); /* * If there aren't any vectors left after applying the pre/post * vectors don't bother with assigning affinity.