Message ID | 20230712125455.1986455-4-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: fix wrong queue mapping for kdump kernel | expand |
Hi Ming, A few lines below in if (aff_mask), vectors can be overwritten again with min(phba->cfg_irq_chann, cpu_cnt). Perhaps we should move blk_mq_max_nr_hw_queues min comparison a little later: diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 3221a934066b..20410789e8b8 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -13025,6 +13025,8 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba) flags |= PCI_IRQ_AFFINITY; } + vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues()); + rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags); if (rc < 0) { lpfc_printf_log(phba, KERN_INFO, LOG_INIT, Regards, Justin On Wed, Jul 12, 2023 at 6:04 AM Ming Lei <ming.lei@redhat.com> wrote: > > Take blk-mq's knowledge into account for calculating io queues. > > Fix wrong queue mapping in case of kdump kernel. > > On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line, > see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() > still returns all CPUs because 'maxcpus=1' just bring up one single > cpu core during booting. > > blk-mq sees single queue in kdump kernel, and in driver's viewpoint > there are still multiple queues, this inconsistency causes driver to apply > wrong queue mapping for handling IO, and IO timeout is triggered. > > Meantime, single queue makes much less resource utilization, and reduce > risk of kernel failure. > > Cc: James Smart <james.smart@broadcom.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > drivers/scsi/lpfc/lpfc_init.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c > index 3221a934066b..88314c3f1dc1 100644 > --- a/drivers/scsi/lpfc/lpfc_init.c > +++ b/drivers/scsi/lpfc/lpfc_init.c > @@ -13012,6 +13012,7 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba) > if (phba->irq_chann_mode != NORMAL_MODE) > aff_mask = &phba->sli4_hba.irq_aff_mask; > > + vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues()); > if (aff_mask) { > cpu_cnt = cpumask_weight(aff_mask); > vectors = min(phba->cfg_irq_chann, cpu_cnt); > -- > 2.40.1 >
On Wed, Jul 12, 2023 at 05:03:19PM -0700, Justin Tee wrote: > Hi Ming, > > A few lines below in if (aff_mask), vectors can be overwritten again > with min(phba->cfg_irq_chann, cpu_cnt). > > Perhaps we should move blk_mq_max_nr_hw_queues min comparison a little later: > > diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c > index 3221a934066b..20410789e8b8 100644 > --- a/drivers/scsi/lpfc/lpfc_init.c > +++ b/drivers/scsi/lpfc/lpfc_init.c > @@ -13025,6 +13025,8 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba) > flags |= PCI_IRQ_AFFINITY; > } > > + vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues()); > + > rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags); > if (rc < 0) { > lpfc_printf_log(phba, KERN_INFO, LOG_INIT, Hi Justin, Indeed, will take it in next version. Thanks, Ming
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 3221a934066b..88314c3f1dc1 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -13012,6 +13012,7 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba) if (phba->irq_chann_mode != NORMAL_MODE) aff_mask = &phba->sli4_hba.irq_aff_mask; + vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues()); if (aff_mask) { cpu_cnt = cpumask_weight(aff_mask); vectors = min(phba->cfg_irq_chann, cpu_cnt);
Take blk-mq's knowledge into account for calculating io queues. Fix wrong queue mapping in case of kdump kernel. On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line, see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus() still returns all CPUs because 'maxcpus=1' just bring up one single cpu core during booting. blk-mq sees single queue in kdump kernel, and in driver's viewpoint there are still multiple queues, this inconsistency causes driver to apply wrong queue mapping for handling IO, and IO timeout is triggered. Meantime, single queue makes much less resource utilization, and reduce risk of kernel failure. Cc: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/scsi/lpfc/lpfc_init.c | 1 + 1 file changed, 1 insertion(+)