Message ID | 20230708020259.1343736-3-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq & nvme-pci: fix io failure in kdump kernel | expand |
On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > Take blk-mq's knowledge into account for calculating io queues. > > Fix wrong queue mapping in case of kdump kernel. > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > still returns all CPUs. That's simply broken. Please fix the arch code to make sure it does not return a bogus num_possible_cpus value for these setups, otherwise you'll have to paper over it in all kind of drivers.
On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > Take blk-mq's knowledge into account for calculating io queues. > > > > Fix wrong queue mapping in case of kdump kernel. > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > still returns all CPUs. > > That's simply broken. Please fix the arch code to make sure > it does not return a bogus num_possible_cpus value for these That is documented in Documentation/admin-guide/kdump/kdump.rst. On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" simply keep one of CPU cores as online, and others as offline. So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for kdump kernel. > setups, otherwise you'll have to paper over it in all kind of > drivers. The issue is only triggered for drivers which use managed irq & multiple hw queues. Thanks, Ming
On Mon, Jul 10, 2023 at 05:14:15PM +0800, Ming Lei wrote: > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > still returns all CPUs. > > > > That's simply broken. Please fix the arch code to make sure > > it does not return a bogus num_possible_cpus value for these > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > simply keep one of CPU cores as online, and others as offline. > > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for > kdump kernel. > > > setups, otherwise you'll have to paper over it in all kind of > > drivers. > > The issue is only triggered for drivers which use managed irq & > multiple hw queues. Is the problem that the managed interrupt sets the effective irq affinity to an offline CPU? You mentioned observed timeouts; are you seeing the "completion polled" nvme message?
On Mon, Jul 10, 2023 at 10:51:43AM -0600, Keith Busch wrote: > On Mon, Jul 10, 2023 at 05:14:15PM +0800, Ming Lei wrote: > > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > > still returns all CPUs. > > > > > > That's simply broken. Please fix the arch code to make sure > > > it does not return a bogus num_possible_cpus value for these > > > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > > simply keep one of CPU cores as online, and others as offline. > > > > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for > > kdump kernel. > > > > > setups, otherwise you'll have to paper over it in all kind of > > > drivers. > > > > The issue is only triggered for drivers which use managed irq & > > multiple hw queues. > > Is the problem that the managed interrupt sets the effective irq > affinity to an offline CPU? You mentioned observed timeouts; are you Yes, the problem is that blk-mq only creates hctx0, so nvme-pci translate it into hctx0's nvme_queue, this way is actually wrong, cause blk-mq's view on queue topo isn't same with nvme's view. > seeing the "completion polled" nvme message? Yes, "completion polled" can be observed. Meantime the warning in __irq_startup_managed() can be triggered from nvme_timeout()->nvme_poll_irqdisable()->enable_irq(). Thanks, Ming
On 07/10/23 at 05:14pm, Ming Lei wrote: > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > still returns all CPUs. > > > > That's simply broken. Please fix the arch code to make sure > > it does not return a bogus num_possible_cpus value for these > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > simply keep one of CPU cores as online, and others as offline. I don't know maxcpus on arm and ppc64 well. But maxcpus=1 or nr_cpus=1 are suggested parameter. Because usually nr_cpus=1 is enough to make kdump kernel work well to capture vmcore. However, user is allowed to specify nr_cpus=n (n>1) if they think multiple cpus are needed in kdump kernel. Your hard coding of cpu number in kdump kernel may be not so reasonable. Please cc kexec mailing list when posting so that people can view the whole thread of discussion. Thanks Baoquan > > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for > kdump kernel. > > > setups, otherwise you'll have to paper over it in all kind of > > drivers. > > The issue is only triggered for drivers which use managed irq & > multiple hw queues. > > > Thanks, > Ming > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Hi Baoquan, On Tue, Jul 11, 2023 at 11:35:50AM +0800, Baoquan He wrote: > On 07/10/23 at 05:14pm, Ming Lei wrote: > > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > > still returns all CPUs. > > > > > > That's simply broken. Please fix the arch code to make sure > > > it does not return a bogus num_possible_cpus value for these > > > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > > simply keep one of CPU cores as online, and others as offline. > > I don't know maxcpus on arm and ppc64 well. But maxcpus=1 or nr_cpus=1 > are suggested parameter. Because usually nr_cpus=1 is enough to make > kdump kernel work well to capture vmcore. However, user is allowed to > specify nr_cpus=n (n>1) if they think multiple cpus are needed in kdump > kernel. Your hard coding of cpu number in kdump kernel may be not so > reasonable. As I mentioned, for arm/ppc64, passing 'maxcpus=1' actually follows Documentation/admin-guide/kdump/kdump.rst. 'nr_cpus=N' just works fine, so not related with this topic. After 'maxcpus=1' is passed, kernel only brings up one of cpu cores as online during booting, and others still can be put into online by userspace. Now this way causes IO timeout on some storage device which uses managed irq and supports multiple io queues. Here the focus is if passing 'maxcpus=1' is valid for kdump kernel, that is we want to hear from our arch/kdump guys. If yes, something needs to be fixed, such as, what this patchset is doing. > > Please cc kexec mailing list when posting so that people can view the > whole thread of discussion. Already Cc kexe & arm/powerpc & irq list. Thanks, Ming
On Mon, Jul 10, 2023 at 5:16 PM Ming Lei <ming.lei@redhat.com> wrote: > > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > still returns all CPUs. > > > > That's simply broken. Please fix the arch code to make sure > > it does not return a bogus num_possible_cpus value for these > In fact, num_possible_cpus is not broken. Quote from admin-guide/kernel-parameters.txt maxcpus= [SMP] Maximum number of processors that an SMP kernel will bring up during bootup. maxcpus=n : n >= 0 limits the kernel to bring up 'n' processors. Surely after bootup you can bring up the other plugged cpu by executing "echo 1 > /sys/devices/system/cpu/cpuX/online". So maxcpus only takes effect during system bootup. While n=0 is a special case, it is equivalent to "nosmp", which also disables the IO APIC. Here, as it explained, maxcpus only affects the bootup, later, extra cpus can be online. > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" On aarch64 and x86, nr_cpus=1 is used, while on ppc64, due to the implementation, nr_cpus=1 can not be supported. Thanks, Pingfan > simply keep one of CPU cores as online, and others as offline. > > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for > kdump kernel. > > > setups, otherwise you'll have to paper over it in all kind of > > drivers. > > The issue is only triggered for drivers which use managed irq & > multiple hw queues. > > > Thanks, > Ming > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec >
Hi Ming, Having no [PATCH 1/2] blk-mq: add blk_mq_max_nr_hw_queues() in inbox. So I reply here. At first glance, I think that the cpu hot plug callback hook should be the remedy for the newly introduced blk_mq_max_nr_hw_queues(), although it is more complicated. Consider the scene where nr_cpus=4, which can speed up the dumping process, the blk_mq_max_nr_hw_queues() can not utilize the other three cpus. Thanks, Pingfan On Mon, Jul 10, 2023 at 5:16 PM Ming Lei <ming.lei@redhat.com> wrote: > > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > still returns all CPUs. > > > > That's simply broken. Please fix the arch code to make sure > > it does not return a bogus num_possible_cpus value for these > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > simply keep one of CPU cores as online, and others as offline. > > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for > kdump kernel. > > > setups, otherwise you'll have to paper over it in all kind of > > drivers. > > The issue is only triggered for drivers which use managed irq & > multiple hw queues. > > > Thanks, > Ming > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec >
On 07/11/23 at 11:53am, Ming Lei wrote: > Hi Baoquan, > > On Tue, Jul 11, 2023 at 11:35:50AM +0800, Baoquan He wrote: > > On 07/10/23 at 05:14pm, Ming Lei wrote: > > > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote: > > > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote: > > > > > Take blk-mq's knowledge into account for calculating io queues. > > > > > > > > > > Fix wrong queue mapping in case of kdump kernel. > > > > > > > > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see > > > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > > > > > still returns all CPUs. > > > > > > > > That's simply broken. Please fix the arch code to make sure > > > > it does not return a bogus num_possible_cpus value for these > > > > > > That is documented in Documentation/admin-guide/kdump/kdump.rst. > > > > > > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1" > > > simply keep one of CPU cores as online, and others as offline. > > > > I don't know maxcpus on arm and ppc64 well. But maxcpus=1 or nr_cpus=1 > > are suggested parameter. Because usually nr_cpus=1 is enough to make > > kdump kernel work well to capture vmcore. However, user is allowed to > > specify nr_cpus=n (n>1) if they think multiple cpus are needed in kdump > > kernel. Your hard coding of cpu number in kdump kernel may be not so > > reasonable. > > As I mentioned, for arm/ppc64, passing 'maxcpus=1' actually follows > Documentation/admin-guide/kdump/kdump.rst. > > 'nr_cpus=N' just works fine, so not related with this topic. > > After 'maxcpus=1' is passed, kernel only brings up one of cpu cores as > online during booting, and others still can be put into online by > userspace. Now this way causes IO timeout on some storage device which > uses managed irq and supports multiple io queues. > > Here the focus is if passing 'maxcpus=1' is valid for kdump > kernel, that is we want to hear from our arch/kdump guys. Yes, 'maxcpus=1' is valid and suggested on ppc64 for kdump kernel if needed, because there's no 'nr_cpus=' support on ppc64 yet. > > If yes, something needs to be fixed, such as, what this patchset is > doing. > > > > > Please cc kexec mailing list when posting so that people can view the > > whole thread of discussion. > > Already Cc kexe & arm/powerpc & irq list. > > > Thanks, > Ming >
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 72725729cb6c..cb13ba203956 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2247,7 +2247,7 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev) */ if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) return 1; - return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues; + return blk_mq_max_nr_hw_queues() + dev->nr_write_queues + dev->nr_poll_queues; } static int nvme_setup_io_queues(struct nvme_dev *dev)
Take blk-mq's knowledge into account for calculating io queues. Fix wrong queue mapping in case of kdump kernel. On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() still returns all CPUs. But blk-mq can only support single queue for kdump kernel, this way causes wrong queue mapping taken for handling IO, and IO timeout is triggered. Meantime, single queue makes much less resource utilization, and reduce risk of kernel failure. Reported-by: Wen Xiong <wenxiong@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)